Global Transformer and Dual Local Attention Network via Deep Shallow Hierarchical Feature Fusion for

Global Transformer and Dual Local Attention Network via Deep Shallow Hierarchical Feature Fusion for

Abstract:

Clinically, retinal vessel segmentation is a significant step in the diagnosis of fundus diseases. However, recent methods generally neglect the difference of semantic information between deep and shallow features, which fail to capture the global and local characterizations in fundus images simultaneously, resulting in the limited segmentation performance for fine vessels. In this article, a global transformer (GT) and dual local attention (DLA) network via deep-shallow hierarchical feature fusion (GT-DLA-dsHFF) are investigated to solve the above limitations. First, the GT is developed to integrate the global information in the retinal image, which effectively captures the long-distance dependence between pixels, alleviating the discontinuity of blood vessels in the segmentation results. Second, DLA, which is constructed using dilated convolutions with varied dilation rates, unsupervised edge detection, and squeeze-excitation block, is proposed to extract local vessel information, consolidating the edge details in the segmentation result. Finally, a novel deep-shallow hierarchical feature fusion (dsHFF) algorithm is studied to fuse the features in different scales in the deep learning framework, respectively, which can mitigate the attenuation of valid information in the process of feature fusion. We verified the GT-DLA-dsHFF on four typical fundus image datasets. The experimental results demonstrate our GT-DLA-dsHFF achieves superior performance against the current methods and detailed discussions verify the efficacy of the proposed three modules. Segmentation results of diseased images show the robustness of our proposed GT-DLA-dsHFF. Implementation codes will be available on https://github.com/YangLibuaa/GT-DLA-dsHFF .