Using Double-Density Dual Tree Wavelet Transform into MFCC for Noisy Speech Recognition

Soe Naing, HM; Hidayat, R; Hartanto, R; Miyanaga, Y

Using Double-Density Dual Tree Wavelet Transform into MFCC for Noisy Speech Recognition

Soe Naing, HM Hidayat, R Hartanto, R Miyanaga, Y

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE), 2020, 00, pp. 302-306
Issue Date:: 2020-12-01

Closed Access

	Filename	Description	Size
	09271737.pdf		796.3 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Soe Naing, HM
dc.contributor.author	Hidayat, R
dc.contributor.author	Hartanto, R
dc.contributor.author	Miyanaga, Y https://orcid.org/0000-0002-2795-2234
dc.date	2020-10-06
dc.date.accessioned	2021-04-30T04:05:05Z
dc.date.available	2021-04-30T04:05:05Z
dc.date.issued	2020-12-01
dc.identifier.citation	2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE), 2020, 00, pp. 302-306
dc.identifier.isbn	978-1-7281-1097-4
dc.identifier.uri	http://hdl.handle.net/10453/148551
dc.description.abstract	The automatic speech recognition has gained significant progress in technology as well as in many applications. However, speech fluctuations due to noise effects significantly reduce recognition accuracy, and recognition on noisy channels is more difficult to generate correct word sequences than in a clean environment. Extracting meaningful acoustic information from noisy speech utterances has been a challenging task recently. Therefore, we present a combination of Mel frequency cepstrum coefficient (MFCC) and double-density dual tree wavelet transformation denoising algorithm to recognize noisy speech utterances. Hybrid frame-level cross entropy deep neural network-hidden Markov model (DNN-HMM) is used as an acoustic modeling activity. According to a suite of experiments, the proposed denoising method provides better performance without affecting the accuracy of higher sound intensity levels. Experimental results demonstrate that the recognition accuracy reach up to 96.6% in 10dB, 91.84% in 5dB, 78.05% in 0dB and 49.37% in -5dB, respectively.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)
dc.relation.ispartof	2020 12th International Conference on Information Technology and Electrical Engineering
dc.relation.isbasedon	10.1109/icitee49829.2020.9271737
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Using Double-Density Dual Tree Wavelet Transform into MFCC for Noisy Speech Recognition
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Yogyakarta, Indonesia
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2021-04-30T04:04:33Z
pubs.finish-date	2020-10-08
pubs.publication-status	Published
pubs.start-date	2020-10-06
pubs.volume	00

Abstract:

The automatic speech recognition has gained significant progress in technology as well as in many applications. However, speech fluctuations due to noise effects significantly reduce recognition accuracy, and recognition on noisy channels is more difficult to generate correct word sequences than in a clean environment. Extracting meaningful acoustic information from noisy speech utterances has been a challenging task recently. Therefore, we present a combination of Mel frequency cepstrum coefficient (MFCC) and double-density dual tree wavelet transformation denoising algorithm to recognize noisy speech utterances. Hybrid frame-level cross entropy deep neural network-hidden Markov model (DNN-HMM) is used as an acoustic modeling activity. According to a suite of experiments, the proposed denoising method provides better performance without affecting the accuracy of higher sound intensity levels. Experimental results demonstrate that the recognition accuracy reach up to 96.6% in 10dB, 91.84% in 5dB, 78.05% in 0dB and 49.37% in -5dB, respectively.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/148551