Abstract:
In this work, we present a benchmark and a hybrid deep neural network for Urdu Text Recognition in natural scene images. Recognizing text in natural scene images is a challenging task, which has attracted the attention of computer vision and pattern recognition communities. In recent years, scene text recognition has widely been studied where; state-of-the-art results are achieved by using deep neural network models. However, most of the research works are performed for English text and a less concentration is given to other languages. In this paper, we investigate the problem of Urdu text recognition in natural scene images. Urdu is a type of cursive text written from right to left direction where, two or more characters are joined to form a word. Recognizing cursive text in natural images is considered an open problem due to variations in its representation. A hybrid deep neural network architecture with skip connections, which combines convolutional and recurrent neural network, is proposed to recognize the Urdu scene text. We introduce a new dataset of 11500 manually cropped Urdu word images from natural scenes and show the baseline results. The network is trained on the whole word image avoiding the traditional character based classification. Data augmentation technique with contrast stretching and histogram equalizer is used to further enhance the size of the dataset. The experimental results on original and augmented word images show state-of-the-art performance of the network.