Deep Learning Based Non Intrusive Multi Objective Speech Assessment Model With Cross Domain Features

Name: Deep Learning Based Non Intrusive Multi Objective Speech Assessment Model With Cross Domain Features
Uploaded: 2024-01-22T13:19:38+0530
Description: Deep Learning Based Non Intrusive Multi Objective Speech Assessment Model With Cross Domain Features

admin

Jan 22, 2024 - 13:19

0 21

Abstract:

This study proposes a cross-domain multi-objective speech assessment model, called MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and distortion assessment scores of an input speech signal. MOSA-Net comprises a convolutional neural network and bidirectional long short-term memory architecture for representation extraction, and a multiplicative attention layer and a fully connected layer for each assessment metric prediction. Additionally, cross-domain features (spectral and time-domain features) and latent representations from self-supervised learned (SSL) models are used as inputs to combine rich acoustic information to obtain more accurate assessments. Experimental results show that in both seen and unseen noise environments, MOSA-Net can improve the linear correlation coefficient (LCC) scores in perceptual evaluation of speech quality (PESQ) prediction, compared to Quality-Net, an existing single-task model for PESQ prediction, and improve LCC scores in short-time objective intelligibility (STOI) prediction, compared to STOI-Net, an existing single-task model for STOI prediction. Moreover, MOSA-Net can be used as a pre-trained model to be effectively adapted to an assessment model for predicting subjective quality and intelligibility scores with a limited amount of training data. Experimental results show that MOSA-Net can improve LCC scores in mean opinion score (MOS) predictions, compared to MOS-SSL, a strong single-task model for MOS prediction. We further adopt the latent representations of MOSA-Net to guide the speech enhancement (SE) process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach. Experimental results show that QIA-SE outperforms the baseline SE system with improved PESQ scores in both seen and unseen noise environments over a baseline SE model.

Click Here To See More

Deep Learning Based Non Intrusive Multi Objective Speech Assessment Model With Cross Domain Features

Deep Learning Based Non Intrusive Multi Objective Speech Assessment Model With Cross Domain Features

Tags:

What's Your Reaction?

Related Posts

Popular Posts

Follow Us

Recommended Posts

Popular Tags

Voting Poll