Abstract:
In edge-assisted visual Simultaneous Localization and Mapping (SLAM), mobile devices offload computationally intensive tasks to an edge cloud. The mobile device should transfer its input data to the edge cloud in a resource-efficient way, especially in visual SLAM systems with their very large image frame input data. Most state-of-the-art systems use a conventional Analyze-Then-Compress (ATC) approach that pre-analyzes the captured frames on the mobile device (incurring substantial processing latencies), and transmits only the (several times smaller) pre-analysis results, namely only the feature representation of so-called key frames (and the corresponding tracking results), to the edge. We examine two novel transmission methods (ATC workflows) for “functional-split” edge-assisted visual SLAM: Feature-Representation-Every-Frame (FREF) transmits the feature representation for every captured frame to the edge, and the edge performs the key frame creation processing (including the tracking); Feature-Representation-Only-Key Frames (FROKF) transmits only the feature representation of key frames without the tracking results to the edge, and the edge performs the tracking. We evaluate these ATC methods in terms of the required network throughput (bandwidth), as well as the latencies for transmission and processing, as well as the resulting end-to-end latency (from frame capture at the mobile device to key frame and tracking results becoming available at the edge cloud) via testbed measurements for various computing hardware platforms. Compared to the existing conventional approach, our newly introduced FREF method can reduce the end-to-end latency down to one quarter; while our FROKF method can reduce the required throughput down to one-fifth. Additionally, feature compression can reduce the required throughput; within the FROKF method, down to one twentieth compared to the conventional method without feature compression, albeit at the expense of significantly increased processing latency for the compression.