Psychoacoustical masking effect-based feature extraction for robust speech recognition

Publication Type:
Journal Article
Citation:
International Journal of Innovative Computing, Information and Control, 2019, 15, (5), pp. 1641-1654
Issue Date:
2019-10-01
Filename Description Size
ijicic-150503.pdfPublished version489.9 kB
Adobe PDF
Full metadata record
A new approach for speech feature extraction in automatic speech recognition (ASR) is proposed in this paper. It is based on the human auditory system. Generally, the mel frequency cepstral coefficients (MFCC) are the most widely used speech features in ASR systems, but one of their main drawbacks is background noise, which can affect and hamper the results. This paper proposes noise robust speech features which improve upon the MFCC. A psychoacoustic model-based feature extraction that simulates the perception of sound in the human auditory system is investigated and integrated into the MFCC. The complexity of the signal can be reduced by using a masking effect during feature extraction, minimizing the feature components without any significant loss in perceiving quality of sound. Moreover, it can reduce the noise effect of speech signal. In this paper, a hidden Markov model is employed to recognize English isolated digits. These experiments verify that the proposed modified method effectively improves the recognition under adverse situations. With respect to the use of perceptual masking effect-based cepstral features, the accuracy reached up to 97.16% in signal to noise ratio at 10dB, 95.02% at 5dB, 90.34% at 0dB, 77.08% at −5dB and 62.76% at −10dB, respectively.
Please use this identifier to cite or link to this item: