Psychoacoustical masking effect-based feature extraction for robust speech recognition

Naing, HMS; Hidayat, R; Winduratna, B; Miyanaga, Y

Psychoacoustical masking effect-based feature extraction for robust speech recognition

Naing, HMS Hidayat, R Winduratna, B Miyanaga, Y

Permalink

Publication Type:: Journal Article
Citation:: International Journal of Innovative Computing, Information and Control, 2019, 15, (5), pp. 1641-1654
Issue Date:: 2019-10-01

Closed Access

	Filename	Description	Size
	ijicic-150503.pdf	Published version	489.9 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Naing, HMS
dc.contributor.author	Hidayat, R
dc.contributor.author	Winduratna, B
dc.contributor.author	Miyanaga, Y https://orcid.org/0000-0002-2795-2234
dc.date.accessioned	2022-09-01T03:17:17Z
dc.date.available	2022-09-01T03:17:17Z
dc.date.issued	2019-10-01
dc.identifier.citation	International Journal of Innovative Computing, Information and Control, 2019, 15, (5), pp. 1641-1654
dc.identifier.issn	1349-4198
dc.identifier.uri	http://hdl.handle.net/10453/161169
dc.description.abstract	A new approach for speech feature extraction in automatic speech recognition (ASR) is proposed in this paper. It is based on the human auditory system. Generally, the mel frequency cepstral coefficients (MFCC) are the most widely used speech features in ASR systems, but one of their main drawbacks is background noise, which can affect and hamper the results. This paper proposes noise robust speech features which improve upon the MFCC. A psychoacoustic model-based feature extraction that simulates the perception of sound in the human auditory system is investigated and integrated into the MFCC. The complexity of the signal can be reduced by using a masking effect during feature extraction, minimizing the feature components without any significant loss in perceiving quality of sound. Moreover, it can reduce the noise effect of speech signal. In this paper, a hidden Markov model is employed to recognize English isolated digits. These experiments verify that the proposed modified method effectively improves the recognition under adverse situations. With respect to the use of perceptual masking effect-based cepstral features, the accuracy reached up to 97.16% in signal to noise ratio at 10dB, 95.02% at 5dB, 90.34% at 0dB, 77.08% at −5dB and 62.76% at −10dB, respectively.
dc.language	en
dc.relation	Japan Society for the Promotion of Science18KK0277
dc.relation	Japan Society for the Promotion of Science18H03212
dc.relation.ispartof	International Journal of Innovative Computing, Information and Control
dc.relation.isbasedon	10.24507/ijicic.15.05.1641
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	01 Mathematical Sciences, 08 Information and Computing Sciences, 09 Engineering
dc.subject.classification	Industrial Engineering & Automation
dc.title	Psychoacoustical masking effect-based feature extraction for robust speech recognition
dc.type	Journal Article
utslib.citation.volume	15
utslib.for	01 Mathematical Sciences
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
dc.date.updated	2022-09-01T03:17:14Z
pubs.issue	5
pubs.publication-status	Published
pubs.volume	15
utslib.citation.issue	5

Abstract:

A new approach for speech feature extraction in automatic speech recognition (ASR) is proposed in this paper. It is based on the human auditory system. Generally, the mel frequency cepstral coefficients (MFCC) are the most widely used speech features in ASR systems, but one of their main drawbacks is background noise, which can affect and hamper the results. This paper proposes noise robust speech features which improve upon the MFCC. A psychoacoustic model-based feature extraction that simulates the perception of sound in the human auditory system is investigated and integrated into the MFCC. The complexity of the signal can be reduced by using a masking effect during feature extraction, minimizing the feature components without any significant loss in perceiving quality of sound. Moreover, it can reduce the noise effect of speech signal. In this paper, a hidden Markov model is employed to recognize English isolated digits. These experiments verify that the proposed modified method effectively improves the recognition under adverse situations. With respect to the use of perceptual masking effect-based cepstral features, the accuracy reached up to 97.16% in signal to noise ratio at 10dB, 95.02% at 5dB, 90.34% at 0dB, 77.08% at −5dB and 62.76% at −10dB, respectively.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/161169