Discrete wavelet denoising into mfcc for noise suppressive in automatic speech recognition system

The Intelligent Networks and Systems Society
Publication Type:
Journal Article
International Journal of Intelligent Engineering and Systems, 2020, 13, (2), pp. 74-82
Issue Date:
Filename Description Size
2020043008.pdf511.76 kB
Adobe PDF
Full metadata record
Automatic Speech Recognition (ASR) is a challenging task and the most problematic issues being in presence of background noise and substantial variability in speech. Extracting the noise-robust features adjust for speech degradations due to noise effect retained popular issue in recent years. This paper presented a framework for wavelet denoising scheme and analysed the different wavelet families and proper thresholding rule into feature extraction to enhance the performance of ASR system. Gaussian Mixture Model-based Hidden Markov Model (GMM-HMM) and Deep Neural Network (DNN)-HMM are used as the speech recognizer. The recognition performance shows that the noise-robust features are obtained while combining with the wavelet transform denoising into Mel Frequency Cepstral Coefficient (MFCC) on Aurora2 database. The best accuracy is gained by cross entropy DNN-HMM training using denoising with Coiflet wavelet and Rigrsure threshold, which provides 97.54% in 10dB, 93.13% in 5dB, 75.63% in 0dB and 37.29% in-5dB.
Please use this identifier to cite or link to this item: