Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation

Yang, N; Usman, M; He, X; Jan, MA; Zhang, L

Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation

Yang, N Usman, M

He, X

Jan, MA Zhang, L

Permalink

Publication Type:: Journal Article
Citation:: IEEE Access, 2017, 5 pp. 27114 - 27125
Issue Date:: 2017-10-09

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download ccepted Manuscript VersionAdobe PDF (9.71 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Yang, N	en_US
dc.contributor.author	Usman, M https://orcid.org/0000-0003-2165-4575	en_US
dc.contributor.author	He, X https://orcid.org/0000-0001-8962-540X	en_US
dc.contributor.author	Jan, MA	en_US
dc.contributor.author	Zhang, L	en_US
dc.date.issued	2017-10-09	en_US
dc.identifier.citation	IEEE Access, 2017, 5 pp. 27114 - 27125	en_US
dc.identifier.uri	http://hdl.handle.net/10453/119928
dc.description.abstract	© 2017 IEEE. Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.	en_US
dc.relation.ispartof	IEEE Access	en_US
dc.relation.isbasedon	10.1109/ACCESS.2017.2761741	en_US
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Time-Frequency Filter Bank: A Simple Approach for Audio and Music Separation	en_US
dc.type	Journal Article
utslib.citation.volume	5	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
utslib.for	10 Technology	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CRIN - Realtime Information Networks
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
utslib.copyright.status	open_access	*
pubs.publication-status	Published	en_US
pubs.volume	5	en_US

Abstract:

© 2017 IEEE. Blind Source Separation techniques are widely used in the field of wireless communication for a very long time to extract signals of interest from a set of multiple signals without training data. In this paper, we investigate the problem of separation of the human voice from a mixture of human voice and sounds from different musical instruments. The human voice may be a singing voice in a song or may be a part of some news, broadcast by a channel with background music. This paper proposes a generalized Short Time Fourier Transform (STFT)-based technique, combined with filter bank to extract vocals from background music. The main purpose is to design a filter bank and to eliminate background aliasing errors with best reconstruction conditions, having approximated scaling factors. Stereo signals in time-frequency domain are used in experiments. The input stereo signals are processed in the form of frames and passed through the proposed STFT-based technique. The output of the STFT-based technique is passed through the filter bank to minimize the background aliasing errors. For reconstruction, first an inverse STFT is applied and then the signals are reconstructed by the OverLap-Add method to get the final output, containing vocals only. The experiments show that the proposed approach performs better than the other state-of-the-art approaches, in terms of Signal-to-Interference Ratio (SIR) and Signal-to-Distortion Ratio (SDR), respectively.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/119928