Stack YOLO A Friendly Hardware Real Time Object Detection Algorithm

Stack YOLO A Friendly Hardware Real Time Object Detection Algorithm

Abstract:

Efficient real-time object detection on edge devices, CPUs, GPU clusters and cloud GPUs is still a very challenging problem due to factors such as computational power, storage capacity and image resolution. To address these issues, this paper proposes a fast and efficient one-stage object detection algorithm - Stack-YOLO. Stack-YOLO uses a backbone network constructed by CSS units, which improve the gradient propagation path and enrich the gradient combination through three stages: Cross, Split and Shuffle. It can also greatly reduce the computation, reduce memory usage and memory access cost, improve detection accuracy, speed up convergence during training, perform model reparameterization during inference, fuse multibranch structure, save memory, and reduce resource requirements. In addition, to improve the computational efficiency of spatial pyramid pooling and to improve the existing pooling strategies that are prone to loss of detail information, this paper proposes the ASPPF pooling method, which replaces the time-consuming parallel pooling with serial pooling operations, automatically learns the feature weight matrix based on contextual information during downsampling, adaptively maintains important feature information, and maintains gradient better than maximum pooling during backpropagation. To improve the computational efficiency of the OTA method and SimOTA using local prior knowledge, this paper proposes the fast dynamic label matching method Fast-OTA based on the global matching cost. In addition, a composite factor model scaling method is proposed to systematically balance resource supply and demand. Based on various comparison experiments on MSCOCO and PASCAL VOC, Stack-YOLO outperforms the best existing algorithms, including Faster RCNN, DETR, YOLOX, and YOLOv6, in terms of speed and accuracy.