Abstract:
Visible (VIS) and infrared (IR) image fusion (VIIF) is a technique used to synthesize the fused image of high visual perception. The existing fusion methods typically work by discovering the commons underlying the two modalities and fusing them in the common space. However, these methods often ignore the modality differences, such as fuzzy details in the IR image, and their well-designed architectures also lead to slow fusion speed. To address these issues, we propose a real-time end-to-end VIIF model based on layer decomposition and re-parameterization (LDRepFM). This model is composed of a layer decomposition guidance network (LDGNet) and a re-parameterization fusion network (RepFNet). First, the LDGNet is used to alleviate the visual quality degradation of the fused image by decomposing the IR image into structural layer and fuzzy layer. Second, in order to achieve a favorable trade-off between the fusion speed and evaluation metrics, the RepFNet is utilized to decouple the training-time multibranch and inference-time plain architecture. Third, the structural layer that has been decomposed by LDGNet is utilized in constructing the guidance fusion loss, which is aimed at optimizing RepFNet. Finally, experiments conducted on the publicly available TNO, RoadScene, M3FD, and RegDB datasets demonstrate the performance of the proposed method to be comparable to the state of the art (SOTA) in terms of both visual effect and quantitative metrics.