Visual Object Detection and Tracking for Internet of Things Devices Based on Spatial Attention Powered Multidomain Network

Visual Object Detection and Tracking for Internet of Things Devices Based on Spatial Attention Powered Multidomain Network

Abstract:

Internet of Things (IoT) has brought changes in many fields by joining physical space with the cyber space. The IoT devices are becoming increasingly complex. With the rapid deployment of cameras, tasks in IoT like visual information are more important, but IoT devices have limited computing resources, including power, computing ability, storage, etc. Some tasks that might be perfectly normal to perform on a computer would be rather challenging on an IoT device. Therefore, how to maintain acceptable performance while minimizing resources is becoming a more consequential part in IoT. In this article, we aim to solve the problem of object detection and tracking in IoT while minimizing resources. The traditional algorithms need to use a convolutional neural network (CNN) to identify different objects in each frame, and then determine the tracking target from identified objects, which typically requires a lot of computing resources. By incorporating spatial attention, and multidomain network, we proposed a novel algorithm named as spatial attention powered multidomain network (SA-MDNet). By adding the spatial attention mechanism to the original MDNet model, and using multiclass cross-entropy loss, we are able to distinguish the background and the target in different video sequences effectively and efficiently. This novel algorithm achieves similar performance on the OTB 50/100/2013 data sets compared to several state-of-art models, while uses only a fraction of the memory compared to MDNet.