Abstract:
Person search aims at localizing and recognizing query persons from raw video frames, which is a combination of two sub-tasks, i.e., pedestrian detection and person re-identification. The dominant fashion is termed as the one-step person search that jointly optimizes detection and identification in a unified network, exhibiting higher efficiency. However, there remain major challenges: (i) conflicting objectives of multiple sub-tasks under the shared feature space, (ii) inconsistent memory bank caused by the limited batch size, (iii) underutilized unlabeled identities during the identification learning. To address these issues, we develop an enhanced d ecoupled and m emory- r einforced network (DMRNet++). First, we simplify the standard tightly coupled pipelines and establish a task-decoupled framework (TDF). Second, we build a memory-reinforced mechanism (MRM), with a slow-moving average of the network to better encode the consistency of the memorized features. Third, considering the potential of unlabeled samples, we model the recognition process as semi-supervised learning. An unlabeled-aided contrastive loss (UCL) is developed to boost the identification feature learning by exploiting the aggregation of unlabeled identities. Experimentally, the proposed DMRNet++ obtains the mAP of 94.5% and 52.1% on CUHK-SYSU and PRW datasets, which exceeds most existing methods.