Abstract:
Recent advances in deep learning have enabled state-of-the-art performance in detecting medium and large-size objects. However, small object detection remains challenging primarily due to the scarcity of information. This paper proposes an end-to-end fusion network that integrates deep and hand-crafted features to address this limitation. A fusion module based on semantic context information is designed to enhance feature discrimination ability. Additionally, we introduce a kind of feature-contrast loss to incorporate prior knowledge into the learning of deep feature according to contrastive learning. Experiments on MS COCO (34.4% {\mathrm {A}}{{\mathrm {P}}_{\mathrm {S}}} ) and PASCAL VOC (85.9% mAP) datasets demonstrate that our approach achieves improved detection accuracy over previous methods, especially for small objects.