Abstract:
Video compression becomes a very important task during real-time aerial surveillance scenarios where limited communication bandwidth and on-board storage greatly restrict air-to-ground and air-to-air communications. In these cases, efficient handling of video data is needed to ensure optimum storage, smoother video transmission, fast and reliable video analysis. Conventional video compression schemes were typically designed for human visual perception rather than automated video analytics. Information loss and artifacts introduced during image/video compression impose serious limitations on the performance of automated video analytics tasks. These limitations are further increased in aerial imagery due to complex background and small size of objects. In this paper, we describe and evaluate a salient region estimation pipeline for aerial imagery to enable adaptive bit-rate allocation during video compression. The salient regions are estimated using a multi-cue moving vehicle detection pipeline, which synergistically fuses complementary appearance and motion cues using deep learning-based object detection and flux tensor-based spatio-temporal filtering approaches. Adaptive compression results using the described multi-cue saliency estimation pipeline are compared against conventional MPEG and JPEG encoding in terms of compression ratio, image quality, and impact on automated video analytics operations. Experimental results on ABQ urban aerial video dataset [1] show that incorporation of contextual information enables high semantic compression ratios of over 2000:1 while preserving image quality for the regions of interest. The proposed pipeline enables better utilization of the limited bandwidth of the air-to-ground or air-to-air network links.