Towards Task Generic Image Compression A Study of Semantics Oriented Metrics

Towards Task Generic Image Compression A Study of Semantics Oriented Metrics

Abstract:

Instead of being observed by human, multimedia data are now more and more fed into machines to perform different kinds of semantic analysis. One image may be analyzed multiple times by different machine vision algorithms for different purposes. While machine vision-oriented image compression has been studied, the existing methods are usually driven by a specific machine vision task, and may not be applicable for the other tasks. We address the task-generic image compression, in the hope that an image is compressed once but used multiple times for different tasks, all with satisfactory performance. Our study is based on the end-to-end learned image compression. We focus ourselves on the distortion metric, i.e., finding out a task-agnostic metric to estimate the quality of reconstructed images. On the one hand, we study deep feature distance as the metric, which transforms images into a latent space by a pretrained convolutional network—the latent space is believed to be more aligned to semantics—and calculates distance in the latent space. On the other hand, inspired by the saliency mechanism, we study an importance-weighted pixel distance as the metric, where the weights are generated to reflect the importance of the pixels to semantics. Moreover, we combine the two distances into one metric to investigate their complementary nature. An extensive set of experiments are performed to evaluate these metrics. Experimental results show that using the combined metric performs the best, and leads to 20.79%  42.69% bits saving under the same semantic analysis performance, compared to using signal fidelity metrics. Interestingly, we observe that using the combined metric also improves the visual quality of the reconstructed images.