Pairwise Algorithm Using the Deep Stacking in Dotnet

Pairwise Algorithm Using the Deep Stacking in Dotnet

Abstract:

Speech separation and pitch estimation in noisy conditions are considered to be a “chicken-and-egg” problem. On one hand, pitch information is an important cue for speech separation. On the other hand, speech separation makes pitch estimation easier when background noise is removed. In this paper, we propose a supervised learning architecture to solve these two problems iteratively. The proposed algorithm is based on the deep stacking network (DSN), which provides a method for stacking simple processing modules to build deep architectures. Each module is a classifier whose target is the ideal binary mask (IBM), and the input vector includes spectral features, pitch-based features and the output from the previous module. During the testing stage, we estimate the pitch using the separation results and update the pitch-based features to the next module. When embedded into the DSN, pitch estimation and speech separation each run several times. We obtain the final results from the last module. Systematic evaluations show that the proposed system results in both a high quality estimated binary mask and accurate pitch estimation and outperforms recent systems in its generalization ability.