Separation of speech sources using an acoustic vector sensor

Shujau, M; Ritz, CH; Burnett, IS

Separation of speech sources using an acoustic vector sensor

Shujau, M Ritz, CH Burnett, IS

Permalink

Publication Type:: Conference Proceeding
Citation:: MMSP 2011 - IEEE International Workshop on Multimedia Signal Processing, 2011
Issue Date:: 2011-12-26

Closed Access

	Filename	Description	Size
	06093797.pdf	Published version	339.12 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Shujau, M	en_US
dc.contributor.author	Ritz, CH	en_US
dc.contributor.author	Burnett, IS https://orcid.org/0000-0003-3795-7722	en_US
dc.date.issued	2011-12-26	en_US
dc.identifier.citation	MMSP 2011 - IEEE International Workshop on Multimedia Signal Processing, 2011	en_US
dc.identifier.isbn	9781457714337	en_US
dc.identifier.uri	http://hdl.handle.net/10453/119466
dc.description.abstract	This paper investigates how the directional characteristics of an Acoustic Vector Sensor (AVS) can be used to separate speech sources. The technique described in this work takes advantage of the frequency domain direction of arrival estimates to identify the location, relative to the AVS array, of each individual speaker in a group of speakers and separate them accordingly into individual speech signals. Results presented in this work show that the technique can be used for real-time separation of speech sources using a single 20ms frame of speech, furthermore the results presented show that there is an average improvement in the Signal to Interference Ratio (SIR) for the proposed algorithm over the unprocessed recording of 15.1 dB and an average improvement of 5.4 dB in terms of Signal to Distortion Ratio (SDR) over the unprocessed recordings. In addition to the SIR and SDR results, Perceptual Evaluation of Speech Quality (PESQ) and listening tests both show an improvement in perceptual quality of 1 Mean Opinion Score (MOS) over unprocessed recordings. © 2011 IEEE.	en_US
dc.relation.ispartof	MMSP 2011 - IEEE International Workshop on Multimedia Signal Processing	en_US
dc.relation.isbasedon	10.1109/MMSP.2011.6093797	en_US
dc.title	Separation of speech sources using an acoustic vector sensor	en_US
dc.type	Conference Proceeding
utslib.for	0913 Mechanical Engineering	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

This paper investigates how the directional characteristics of an Acoustic Vector Sensor (AVS) can be used to separate speech sources. The technique described in this work takes advantage of the frequency domain direction of arrival estimates to identify the location, relative to the AVS array, of each individual speaker in a group of speakers and separate them accordingly into individual speech signals. Results presented in this work show that the technique can be used for real-time separation of speech sources using a single 20ms frame of speech, furthermore the results presented show that there is an average improvement in the Signal to Interference Ratio (SIR) for the proposed algorithm over the unprocessed recording of 15.1 dB and an average improvement of 5.4 dB in terms of Signal to Distortion Ratio (SDR) over the unprocessed recordings. In addition to the SIR and SDR results, Perceptual Evaluation of Speech Quality (PESQ) and listening tests both show an improvement in perceptual quality of 1 Mean Opinion Score (MOS) over unprocessed recordings. © 2011 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/119466