Abstract:
Multimodal fusion is an essential research area in computer vision application. However, there are still many obstacles in the image fusion domain that cause the loss key content, due to method limitation or its least efficiency. To solve these problems, a novel method pyramid feature attention fusion strategy (PFAF-Net) based on multiscale features with core idea of different level fusion strategy is proposed. First, multiscale high-level features with different receptive fields are extracted by pyramid feature extraction module. Second, high-level and low-level features are fused with different fusion techniques, i.e., attention-based fusion strategies are adopted to efficiently fused the global and local features separately. Finally, the fused image is reconstructed with the enhanced fused features by the decoder. Thus, the proposed method not only integrates multimodal data but also has rich content details, inferring a superior image fusion. In addition, the proposed PFAF-Net has better generalization ability than existing methods on four different multimodal (i.e., multimodal medical, multiexposure, multifocus, and visual-infrared) benchmark datasets. Compared with other state-of-the-art fusion methods experimentally, the proposed method has shown comparable or better fusion performance in both subjective and objective evaluation.