Neural Topic Modelling with Deep Generative Models

Kumar, Amit

Neural Topic Modelling with Deep Generative Models

Kumar, Amit

Permalink

Publication Type:: Thesis
Issue Date:: 2023

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (1.63 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Kumar, Amit
dc.date.accessioned	2023-10-16T01:29:06Z
dc.date.available	2023-10-16T01:29:06Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/10453/172680
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Topic modelling is a popular task of natural language processing (NLP) aimed to automatically discover the main, shared topics of a given collection of documents. In addition, topic modelling is able to determine the topic proportions of each individual document in the collection, which can help with their categorization and organization. Over the years, topic models have found application and proved useful for a broad variety of fields including business, finance, healthcare, education, the media industry, social media, digital agriculture and many others. Like many other applications of NLP and machine learning, in recent times topic models have substantially improved their effectiveness thanks to the integration with deep learning---and deep generative models in particular---which has gained them the collective appellation of neural topic models. However, many improvements are still possible and needed, and this thesis has aimed to make significant contributions in this direction. As a first contribution, we have explored the use of reinforcement learning for refining the training of the models. To this aim, we have proposed novel training objectives based on the policy gradient theorem and contemporary gradient estimators such as REINFORCE with baseline, the Gumbel-Softmax and REBAR. The experimental results over several topic modelling datasets have invariably shown the improved performance of the models. As a second contribution, we have explored how to integrate the powerful, contextualized document representations (i.e., Transformer-based embeddings) in the training objective of the model. This, too, has led to marked performance improvements over probing datasets. Eventually, we have extended the investigation to dynamic topic models, which are models capable of analyzing time-stamped document collections and extracting sets of topics that adapt over time. For these models, we have proposed a modification of the topic distributions which allows controlling their sparsity, thus adjusting to the characteristics of the collection to be analyzed. Once more, the experimental results have given evidence to the effectiveness of the proposed approach.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/172680/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2023 Amit Kumar
dc.rights	au.edu.uts.lib/cph
dc.title	Neural Topic Modelling with Deep Generative Models	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Topic modelling is a popular task of natural language processing (NLP) aimed to automatically discover the main, shared topics of a given collection of documents. In addition, topic modelling is able to determine the topic proportions of each individual document in the collection, which can help with their categorization and organization. Over the years, topic models have found application and proved useful for a broad variety of fields including business, finance, healthcare, education, the media industry, social media, digital agriculture and many others. Like many other applications of NLP and machine learning, in recent times topic models have substantially improved their effectiveness thanks to the integration with deep learning---and deep generative models in particular---which has gained them the collective appellation of neural topic models. However, many improvements are still possible and needed, and this thesis has aimed to make significant contributions in this direction. As a first contribution, we have explored the use of reinforcement learning for refining the training of the models. To this aim, we have proposed novel training objectives based on the policy gradient theorem and contemporary gradient estimators such as REINFORCE with baseline, the Gumbel-Softmax and REBAR. The experimental results over several topic modelling datasets have invariably shown the improved performance of the models. As a second contribution, we have explored how to integrate the powerful, contextualized document representations (i.e., Transformer-based embeddings) in the training objective of the model. This, too, has led to marked performance improvements over probing datasets. Eventually, we have extended the investigation to dynamic topic models, which are models capable of analyzing time-stamped document collections and extracting sets of topics that adapt over time. For these models, we have proposed a modification of the topic distributions which allows controlling their sparsity, thus adjusting to the characteristics of the collection to be analyzed. Once more, the experimental results have given evidence to the effectiveness of the proposed approach.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/172680