Neural Networks for Music Emotion Recognition and Social Tags Emotion Representation

Publication Type:
Thesis
Issue Date:
2023
Full metadata record
Music Emotion Recognition (MER), an important branch of Music Information Retrieval (MIR) systems, has become a very active research area, driven by the need to detect emotion in music automatically. A great deal of research has contributed to this area. With the emergence of neural networks, MER research has evolved from traditional machine learning methods combined with acoustic features to neural network learning methods combined with multi-source features. However, research gaps still exist in the following aspects. First, most existing research uses pre-processed audio features as learning model inputs, which require domain knowledge and work effort for feature selection. Limited attempts are made to use raw audio as model input directly. Secondly, few researchers partitioned the given music clips into shorter segments as model inputs due to the lack of segment-level target labels for supervised learning. Additionally, utilizing social tags is a good way to provide annotations for music emotion recognition. But tags are usually selected within a limited set of emotion corpus as discrete labels. Research rarely focuses on large-scale tags analysis and quantifies them in a dimensional emotion space. I proposed solutions based on neural network methodologies to fill the above research gaps. Regarding the first point, I adopt a novel end-to-end deep learning architecture where multi-view convolutional neural networks are designed as feature extractors, followed by Bidirectional Long Short-Term Memory (BiLSTM) to capture temporal context sufficiently and predict dynamic music emotion. For the second one, I designed the two-stage learning framework, which uses music segments as model inputs without requiring segment-level labels. By applying the unsupervised learning method, segment-level feature representation could be generated. Then these time-series segment-level features are assembled and fed into a BiLSTM model to achieve the final music emotion classification. For the last one, I contributed to social tag analysis related to music emotion by utilizing neural word embedding approaches. This way, social tags could be mapped into the dimensional emotion plane for further quantitative use. To conclude, my research aims to improve the performance of music emotion recognition with neural network methods and study social tags representation for emotion annotation using word embedding techniques. This thesis presents all of the solution details. Meanwhile, music emotion background, related research work and plans are added to give a better view.
Please use this identifier to cite or link to this item: