A Two Stream Boundary Aware Network for Generic Image Manipulation Localization

A Two Stream Boundary Aware Network for Generic Image Manipulation Localization

Abstract:

Finding tampered regions in images is a common research topic in machine learning and computer vision. Although many image manipulation location algorithms have been proposed, most of them only focus on RGB images with different color spaces, and the frequency information that contains the potential tampering clues is often ignored. Moreover, among the manipulation operations, splicing and copy-move are two frequently used methods, but as their characteristics are quite different, specific methods have been individually designed for detecting the operations of either splicing or copy-move, and it is very difficult to widely apply these methods in practice. To solve these issues, in this work, a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) is proposed for generic image manipulation localization where the RGB stream, the frequency stream, and the boundary artifact location are explored in a unified framework. Specifically, we first design an adaptive frequency selection module (AFS) to adaptively select the appropriate frequency to mine inconsistent statistics and eliminate the interference of redundant statistics. Then, an adaptive cross-attention fusion module (ACF) is proposed to adaptively fuse the RGB feature and the frequency feature. Finally, the boundary artifact location network (BAL) is designed to locate the boundary artifacts for which the parameters are jointly updated by the outputs of the ACF, and its results are further fed into the decoder. Thus, the parameters of the RGB stream, the frequency stream, and the boundary artifact location network are jointly optimized, and their latent complementary relationships are fully mined. The results of the extensive experiments performed on six public benchmarks of the image manipulation localization task, namely, CASIA1.0, COVER, Carvalho, In-The-Wild, NIST-16, and IMD-2020, demonstrate that the proposed TBNet can substantially outperform state-of-the-art generic image manipulation localization methods in terms of MCC, F1, and AUC while maintaining robustness with respect to various attacks. Compared with DeepLabV3+ on the CASIA1.0, COVER, Carvalho, In-The-Wild, and NIST-16 datasets, the improvements in MCC/F1 reach 11%/11.1%, 8.2%/10.3%, 10.2%/11.6%, 8.9%/6.2%, and 13.3%/16.0%, respectively. Moreover, when IMD2020 is utilized, its AUC improvement can achieve 14.7%.