A Deep Learning Approach for Identifying and Discriminating Spoken Arabic among Other Languages

A Deep Learning Approach for Identifying and Discriminating Spoken Arabic among Other Languages

Abstract:

Spoken Language Identification (SLID) is an important step in speech-to-speech translation systems and multi-lingual automatic speech recognition. In recent research, deep learning mechanisms have been the prevailing approaches for spoken language identification. This paper aims to study, detect, and analyze spoken languages similar to Arabic in pronouncing certain words and then proposes a deep learning-based architecture, specifically the Bidirectional Long Short Term Memory (BLSTM), for spoken Arabic language identification and discrimination between these similar languages, namely, German, Spanish, French, and Russian, all of which are taken from Mozilla speech corpus languages. Additionally, our work involves a linguistic study of these considered languages. A total of ten thousand speakers are chosen for all five languages, and the BLSTM architecture is designed and implemented using acoustic signal features and applied to five experiments in this paper. The results show a precision of 98.97%, 98.73%, 98.47%, and 99.75% for identifying the spoken Arabic language separately along with German, Spanish, French, and Russian, respectively. Additionally, we achieved an average accuracy of 95.15% for discriminating between all these considered five languages in terms of the pronunciation of words. Our findings confirm that a BLSTM architecture is able to distinguish between observable similar pronunciations of words in considered languages.