A general framework for learning prosodic-enhanced representation of rap lyrics

Liang, H; Wang, H; Li, Q; Wang, J; Xu, G; Chen, J; Wei, JM; Yang, Z

A general framework for learning prosodic-enhanced representation of rap lyrics

Liang, H Wang, H Li, Q Wang, J Xu, G

Chen, J Wei, JM Yang, Z

Permalink

Publication Type:: Journal Article
Citation:: World Wide Web, 2019, 22 (6), pp. 2267 - 2289
Issue Date:: 2019-11-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download Accepted Manuscript VersionAdobe PDF (12.09 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Liang, H	en_US
dc.contributor.author	Wang, H	en_US
dc.contributor.author	Li, Q	en_US
dc.contributor.author	Wang, J	en_US
dc.contributor.author	Xu, G https://orcid.org/0000-0003-4493-6663	en_US
dc.contributor.author	Chen, J	en_US
dc.contributor.author	Wei, JM	en_US
dc.contributor.author	Yang, Z	en_US
dc.date.issued	2019-11-01	en_US
dc.identifier.citation	World Wide Web, 2019, 22 (6), pp. 2267 - 2289	en_US
dc.identifier.issn	1386-145X	en_US
dc.identifier.uri	http://hdl.handle.net/10453/130742
dc.description.abstract	© 2019, Springer Science+Business Media, LLC, part of Springer Nature. Learning and analyzing rap lyrics is a significant basis for many Web applications, such as music recommendation, automatic music categorization, and music information retrieval, due to the abundant source of digital music in the World Wide Web. Although numerous studies have explored the topic, knowledge in this field is far from satisfactory, because critical issues, such as prosodic information and its effective representation, as well as appropriate integration of various features, are usually ignored. In this paper, we propose a hierarchical attention variational a utoe ncoder framework (HAVAE), which simultaneously considers semantic and prosodic features for rap lyrics representation learning. Specifically, the representation of the prosodic features is encoded by phonetic transcriptions with a novel and effective strategy (i.e., rhyme2vec). Moreover, a feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation. A comprehensive empirical evaluation demonstrates that the proposed framework outperforms the state-of-the-art approaches under various metrics in different rap lyrics learning tasks.	en_US
dc.relation.ispartof	World Wide Web	en_US
dc.relation.isbasedon	10.1007/s11280-019-00672-2	en_US
dc.subject.classification	Information Systems	en_US
dc.title	A general framework for learning prosodic-enhanced representation of rap lyrics	en_US
dc.type	Journal Article
utslib.citation.volume	6	en_US
utslib.citation.volume	22	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0804 Data Format	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	open_access
pubs.issue	6	en_US
pubs.publication-status	Published	en_US
pubs.volume	22	en_US

Abstract:

© 2019, Springer Science+Business Media, LLC, part of Springer Nature. Learning and analyzing rap lyrics is a significant basis for many Web applications, such as music recommendation, automatic music categorization, and music information retrieval, due to the abundant source of digital music in the World Wide Web. Although numerous studies have explored the topic, knowledge in this field is far from satisfactory, because critical issues, such as prosodic information and its effective representation, as well as appropriate integration of various features, are usually ignored. In this paper, we propose a hierarchical attention variational a utoe ncoder framework (HAVAE), which simultaneously considers semantic and prosodic features for rap lyrics representation learning. Specifically, the representation of the prosodic features is encoded by phonetic transcriptions with a novel and effective strategy (i.e., rhyme2vec). Moreover, a feature aggregation strategy is proposed to appropriately integrate various features and generate prosodic-enhanced representation. A comprehensive empirical evaluation demonstrates that the proposed framework outperforms the state-of-the-art approaches under various metrics in different rap lyrics learning tasks.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/130742