Beyond Pre-training: Learning for Knowledge Updates in Language Models
- Publication Type:
- Thesis
- Issue Date:
- 2024
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Modern language models (LMs) have learned an enormous amount of knowledge during pre-training, making them versatile in solving various downstream natural language processing (NLP) tasks. These pre-trained models are capable of capturing rich semantic patterns within large-scale text corpora and learning high-quality representations of text. During inference, LMs can leverage the knowledge acquired during pre-training from their parameters to address various NLP tasks, demonstrating superior performance compared to traditional NLP approaches. However, there is a significant challenge remains unsolved: LMs are static after pre-training, and there is no mechanism to update themselves or adapt to a changing environment. Yet, our world is dynamic and constantly evolving. The static nature of trained LMs causes the memorized knowledge to become quickly obsolete, which often leads to hallucinations and renders them unreliable and impractical for evolving downstream applications.
In this thesis, we aim to address a central question: how can new knowledge be incorporated efficiently into LMs beyond the pre-training stage? Specifically, we introduce novel approaches from three aspects. First, we propose an efficient data annotation method for training new LMs. Our method significantly reduces the amount of annotation data required while achieving improved performance under a weakly-supervised setting, thereby efficiently integrating new knowledge into LMs beyond pre-training. Second, we introduce a continual adaptation approach for LMs to emerging knowledge. We formulated the problem of continual instruction tuning (CIT) to enable LMs to continuously learn from emerging tasks and established a benchmark suite that includes both learning and evaluation protocols. Lastly, we propose an adaptive retrieval augmentation approach for LMs at inference, which incorporates new knowledge efficiently without altering the original parameters.
Experiments conducted across various NLP tasks demonstrate the effectiveness of our approaches for incorporating new knowledge into LMs beyond the pre-training stage. Overall, our research addresses the central research question by presenting novel methods and analyses for enhancing LMs in diverse NLP applications.
Please use this identifier to cite or link to this item:
