Abstract:
Infrared and visible image fusion, which highlights radiometric and detailed texture information and completely and accurately describes objects, is a long-standing and well-studied task in computer vision. Existing convolutional neural network-based approaches that leverage end-to-end networks to fuse infrared and visible images have made significant progress. However, most approaches typically extract the features in the encoder segment and use a coarse fusion strategy. Unlike these algorithms, this study proposes a multiscale receptive field amplification fusion network (MRANet) to effectively extract the local and global features from images. Particularly, we extract long-range information in the encoder segment using a convolutional residual structure as the main backbone and a simplified uniformer as an auxiliary backbone, both of which are ResNet-inspired. Additionally, we propose an effective multiscale fusion strategy based on an attention mechanism to integrate the two modalities. Extensive experiments demonstrate that MRANet performs efficiently on image fusion datasets.