Neural Networks for Music Emotion Recognition and Social Tags Emotion Representation

He, Na

Neural Networks for Music Emotion Recognition and Social Tags Emotion Representation

He, Na

Permalink

Publication Type:: Thesis
Issue Date:: 2023

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (3.26 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	He, Na
dc.date.accessioned	2023-09-05T23:49:08Z
dc.date.available	2023-09-05T23:49:08Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/10453/171926
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Music Emotion Recognition (MER), an important branch of Music Information Retrieval (MIR) systems, has become a very active research area, driven by the need to detect emotion in music automatically. A great deal of research has contributed to this area. With the emergence of neural networks, MER research has evolved from traditional machine learning methods combined with acoustic features to neural network learning methods combined with multi-source features. However, research gaps still exist in the following aspects. First, most existing research uses pre-processed audio features as learning model inputs, which require domain knowledge and work effort for feature selection. Limited attempts are made to use raw audio as model input directly. Secondly, few researchers partitioned the given music clips into shorter segments as model inputs due to the lack of segment-level target labels for supervised learning. Additionally, utilizing social tags is a good way to provide annotations for music emotion recognition. But tags are usually selected within a limited set of emotion corpus as discrete labels. Research rarely focuses on large-scale tags analysis and quantifies them in a dimensional emotion space. I proposed solutions based on neural network methodologies to fill the above research gaps. Regarding the first point, I adopt a novel end-to-end deep learning architecture where multi-view convolutional neural networks are designed as feature extractors, followed by Bidirectional Long Short-Term Memory (BiLSTM) to capture temporal context sufficiently and predict dynamic music emotion. For the second one, I designed the two-stage learning framework, which uses music segments as model inputs without requiring segment-level labels. By applying the unsupervised learning method, segment-level feature representation could be generated. Then these time-series segment-level features are assembled and fed into a BiLSTM model to achieve the final music emotion classification. For the last one, I contributed to social tag analysis related to music emotion by utilizing neural word embedding approaches. This way, social tags could be mapped into the dimensional emotion plane for further quantitative use. To conclude, my research aims to improve the performance of music emotion recognition with neural network methods and study social tags representation for emotion annotation using word embedding techniques. This thesis presents all of the solution details. Meanwhile, music emotion background, related research work and plans are added to give a better view.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/171926/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2023 Na HE
dc.rights	au.edu.uts.lib/nph
dc.title	Neural Networks for Music Emotion Recognition and Social Tags Emotion Representation	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Music Emotion Recognition (MER), an important branch of Music Information Retrieval (MIR) systems, has become a very active research area, driven by the need to detect emotion in music automatically. A great deal of research has contributed to this area. With the emergence of neural networks, MER research has evolved from traditional machine learning methods combined with acoustic features to neural network learning methods combined with multi-source features. However, research gaps still exist in the following aspects. First, most existing research uses pre-processed audio features as learning model inputs, which require domain knowledge and work effort for feature selection. Limited attempts are made to use raw audio as model input directly. Secondly, few researchers partitioned the given music clips into shorter segments as model inputs due to the lack of segment-level target labels for supervised learning. Additionally, utilizing social tags is a good way to provide annotations for music emotion recognition. But tags are usually selected within a limited set of emotion corpus as discrete labels. Research rarely focuses on large-scale tags analysis and quantifies them in a dimensional emotion space. I proposed solutions based on neural network methodologies to fill the above research gaps. Regarding the first point, I adopt a novel end-to-end deep learning architecture where multi-view convolutional neural networks are designed as feature extractors, followed by Bidirectional Long Short-Term Memory (BiLSTM) to capture temporal context sufficiently and predict dynamic music emotion. For the second one, I designed the two-stage learning framework, which uses music segments as model inputs without requiring segment-level labels. By applying the unsupervised learning method, segment-level feature representation could be generated. Then these time-series segment-level features are assembled and fed into a BiLSTM model to achieve the final music emotion classification. For the last one, I contributed to social tag analysis related to music emotion by utilizing neural word embedding approaches. This way, social tags could be mapped into the dimensional emotion plane for further quantitative use. To conclude, my research aims to improve the performance of music emotion recognition with neural network methods and study social tags representation for emotion annotation using word embedding techniques. This thesis presents all of the solution details. Meanwhile, music emotion background, related research work and plans are added to give a better view.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171926