Beyond Pre-training: Learning for Knowledge Updates in Language Models

Zhang, Zihan

Beyond Pre-training: Learning for Knowledge Updates in Language Models

Zhang, Zihan

Permalink

Publication Type:: Thesis
Issue Date:: 2024

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (2.91 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Zihan
dc.date.accessioned	2025-06-30T01:30:39Z
dc.date.available	2025-06-30T01:30:39Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/10453/187988
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Modern language models (LMs) have learned an enormous amount of knowledge during pre-training, making them versatile in solving various downstream natural language processing (NLP) tasks. These pre-trained models are capable of capturing rich semantic patterns within large-scale text corpora and learning high-quality representations of text. During inference, LMs can leverage the knowledge acquired during pre-training from their parameters to address various NLP tasks, demonstrating superior performance compared to traditional NLP approaches. However, there is a significant challenge remains unsolved: LMs are static after pre-training, and there is no mechanism to update themselves or adapt to a changing environment. Yet, our world is dynamic and constantly evolving. The static nature of trained LMs causes the memorized knowledge to become quickly obsolete, which often leads to hallucinations and renders them unreliable and impractical for evolving downstream applications. In this thesis, we aim to address a central question: how can new knowledge be incorporated efficiently into LMs beyond the pre-training stage? Specifically, we introduce novel approaches from three aspects. First, we propose an efficient data annotation method for training new LMs. Our method significantly reduces the amount of annotation data required while achieving improved performance under a weakly-supervised setting, thereby efficiently integrating new knowledge into LMs beyond pre-training. Second, we introduce a continual adaptation approach for LMs to emerging knowledge. We formulated the problem of continual instruction tuning (CIT) to enable LMs to continuously learn from emerging tasks and established a benchmark suite that includes both learning and evaluation protocols. Lastly, we propose an adaptive retrieval augmentation approach for LMs at inference, which incorporates new knowledge efficiently without altering the original parameters. Experiments conducted across various NLP tasks demonstrate the effectiveness of our approaches for incorporating new knowledge into LMs beyond the pre-training stage. Overall, our research addresses the central research question by presenting novel methods and analyses for enhancing LMs in diverse NLP applications.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/187988/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2024 Zihan Zhang
dc.rights	au.edu.uts.lib/cph
dc.title	Beyond Pre-training: Learning for Knowledge Updates in Language Models	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Modern language models (LMs) have learned an enormous amount of knowledge during pre-training, making them versatile in solving various downstream natural language processing (NLP) tasks. These pre-trained models are capable of capturing rich semantic patterns within large-scale text corpora and learning high-quality representations of text. During inference, LMs can leverage the knowledge acquired during pre-training from their parameters to address various NLP tasks, demonstrating superior performance compared to traditional NLP approaches. However, there is a significant challenge remains unsolved: LMs are static after pre-training, and there is no mechanism to update themselves or adapt to a changing environment. Yet, our world is dynamic and constantly evolving. The static nature of trained LMs causes the memorized knowledge to become quickly obsolete, which often leads to hallucinations and renders them unreliable and impractical for evolving downstream applications. In this thesis, we aim to address a central question: how can new knowledge be incorporated efficiently into LMs beyond the pre-training stage? Specifically, we introduce novel approaches from three aspects. First, we propose an efficient data annotation method for training new LMs. Our method significantly reduces the amount of annotation data required while achieving improved performance under a weakly-supervised setting, thereby efficiently integrating new knowledge into LMs beyond pre-training. Second, we introduce a continual adaptation approach for LMs to emerging knowledge. We formulated the problem of continual instruction tuning (CIT) to enable LMs to continuously learn from emerging tasks and established a benchmark suite that includes both learning and evaluation protocols. Lastly, we propose an adaptive retrieval augmentation approach for LMs at inference, which incorporates new knowledge efficiently without altering the original parameters. Experiments conducted across various NLP tasks demonstrate the effectiveness of our approaches for incorporating new knowledge into LMs beyond the pre-training stage. Overall, our research addresses the central research question by presenting novel methods and analyses for enhancing LMs in diverse NLP applications.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/187988