Backward Compatible Spatialised Teleconferencing based on Squeezed Recordings

Ritz, CH; Shujau, M; Zheng, X; Cheng, B; Cheng, E; Burnett, IS

Backward Compatible Spatialised Teleconferencing based on Squeezed Recordings

Ritz, CH Shujau, M Zheng, X Cheng, B Cheng, E Burnett, IS

Permalink

Publisher:: In-Tech
Publication Type:: Chapter
Citation:: Advances in Sound Localization, 2011, pp. 363 - 384
Issue Date:: 2011

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (455.47 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Ritz, CH	en_US
dc.contributor.author	Shujau, M	en_US
dc.contributor.author	Zheng, X	en_US
dc.contributor.author	Cheng, B	en_US
dc.contributor.author	Cheng, E	en_US
dc.contributor.author	Burnett, IS https://orcid.org/0000-0003-3795-7722	en_US
dc.contributor.editor	Strumillo, P	en_US
dc.date.issued	2011	en_US
dc.identifier.citation	Advances in Sound Localization, 2011, pp. 363 - 384	en_US
dc.identifier.isbn	978-953-307-224-1	en_US
dc.identifier.uri	http://hdl.handle.net/10453/117596
dc.description.abstract	Commercial teleconferencing systems currently available, although offering sophisticated video stimulus of the remote participants, commonly employ only mono or stereo audio playback for the user. However, in teleconferencing applications where there are multiple participants at multiple sites, spatializing the audio reproduced at each site (using headphones or loudspeakers) to assist listeners to distinguish between participating speakers can significantly improve the meeting experience (Baldis, 2001; Evans et al., 2000; Ward & Elko 1999; Kilgore et al., 2003; Wrigley et al., 2009; James & Hawksford, 2008). An example is Vocal Village (Kilgore et al., 2003), which uses online avatars to co-locate remote participants over the Internet in virtual space with audio spatialized over headphones (Kilgore, et al., 2003). This system adds speaker location cues to monaural speech to create a user manipulable soundfield that matches the avatar’s position in the virtual space. Giving participants the freedom to manipulate the acoustic location of other participants in the rendered sound scene that they experience has been shown to provide for improved multitasking performance (Wrigley et al., 2009). A system for multiparty teleconferencing requires firstly a stage for recording speech from multiple participants at each site. These signals then need to be compressed to allow for efficient transmission of the spatial speech. One approach is to utilise close-talking microphones to record each participant (e.g. lapel microphones), and then encode each speech signal separately prior to transmission (James & Hawksford, 2008). Alternatively, for increased flexibility, a microphone array located at a central point on, say, a meeting table can be used to generate a multichannel recording of the meeting speech A microphone array approach is adopted in this work and allows for processing of the recordings to identify relative spatial locations of the sources as well as multichannel speech enhancement techniques to improve the quality of recordings in noisy environments. For efficient transmission of the recorded signals, the approach also requires a multichannel compression technique suitable to spatially recorded speech signals.	en_US
dc.publisher	In-Tech	en_US
dc.relation.ispartof	Advances in Sound Localization	en_US
dc.relation.isbasedon	10.5772/14413	en_US
dc.title	Backward Compatible Spatialised Teleconferencing based on Squeezed Recordings	en_US
dc.type	Chapter
utslib.location	UK	en_US
utslib.for	0913 Mechanical Engineering	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	open_access
pubs.consider-herdc	false	en_US
pubs.place-of-publication	UK	en_US
pubs.publication-status	Published	en_US

Abstract:

Commercial teleconferencing systems currently available, although offering sophisticated video stimulus of the remote participants, commonly employ only mono or stereo audio playback for the user. However, in teleconferencing applications where there are multiple participants at multiple sites, spatializing the audio reproduced at each site (using headphones or loudspeakers) to assist listeners to distinguish between participating speakers can significantly improve the meeting experience (Baldis, 2001; Evans et al., 2000; Ward & Elko 1999; Kilgore et al., 2003; Wrigley et al., 2009; James & Hawksford, 2008). An example is Vocal Village (Kilgore et al., 2003), which uses online avatars to co-locate remote participants over the Internet in virtual space with audio spatialized over headphones (Kilgore, et al., 2003). This system adds speaker location cues to monaural speech to create a user manipulable soundfield that matches the avatar’s position in the virtual space. Giving participants the freedom to manipulate the acoustic location of other participants in the rendered sound scene that they experience has been shown to provide for improved multitasking performance (Wrigley et al., 2009). A system for multiparty teleconferencing requires firstly a stage for recording speech from multiple participants at each site. These signals then need to be compressed to allow for efficient transmission of the spatial speech. One approach is to utilise close-talking microphones to record each participant (e.g. lapel microphones), and then encode each speech signal separately prior to transmission (James & Hawksford, 2008). Alternatively, for increased flexibility, a microphone array located at a central point on, say, a meeting table can be used to generate a multichannel recording of the meeting speech A microphone array approach is adopted in this work and allows for processing of the recordings to identify relative spatial locations of the sources as well as multichannel speech enhancement techniques to improve the quality of recordings in noisy environments. For efficient transmission of the recorded signals, the approach also requires a multichannel compression technique suitable to spatially recorded speech signals.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/117596