Abstract:
Automatic text spotting can help unmanned vehicles read human texts, thus improving the safety and reliability of autonomous driving. Some existing text spotting models use inefficient region proposal networks or modules based on the recurrent neural network. Region proposal networks produce redundant anchors. Recurrent neural networks cannot perform parallel operations well. The existence of these inefficient modules prevents these models from applying in autonomous unmanned vehicles. In this paper, we propose an end-to-end two-stage text spotting model named Center TextSpotter, which is a convolution model that does not involve region proposal networks and recurrent neural network. Moreover, we develop a weakly supervised training method and a feature fusion module. The weakly supervised training method adjusts the loss by weighting the predicted labels to improve the performance of text recognition. The feature fusion module fuses the features of the proposal with the features of the proposal-context to enhance the overall performance. Our model follows the modular design principle. So it can be easily extended and modified. This paper presents an extension scheme based on graph neural network. By adding the graph neural network before the fully connected layer, the model learns better features. Experiments demonstrate that Center TextSpotter can complete the task of text spotting for autonomous unmanned vehicles commendably.