Abstract:
One of the most challenging problems in computer vision and image processing is the detection of the semantic regions of interest (SRoIs). In this article, we propose a method using OpenAI’s contrastive language-image pretraining (CLIP) model to find SRoIs by performing a semantic search for objects in the image that are detected by an object detection model called generic RoI extractor (GRoIE). Finding the SRoI can be used in different image-processing tasks such as image and video compression, enhancement, and reformatting. By knowing the SRoI within images, we can improve the visual quality of images by compressing the more important parts with higher quality and the less important parts, such as the background, with a lower quality. This operation can be achieved without changing the overall compression ratio and the peak signal-to-noise ratio (PSNR) quality metric. Finding the SRoI can make the processes of image enhancement and color correction more accurate by focusing only on the important parts. Moreover, in the image reformatting process, the important parts of the image may be lost. But by using the SRoI, we can reformat the image in a better way by keeping the regions within a frame that draw the greatest audience attention.