A Study on Neural-based Code Summarization in Low-resource Settings

Publication Type:
Issue Date:
Full metadata record
Automated software engineering with deep learning techniques has been comprehensively explored because of breakthroughs in code representation learning. Many code intelligence approaches have been proposed for the downstream tasks of this field in the past years, contributing to significant performance progress. Among these downstream tasks, code summarization has been the central research topic because of its contributions to practical applications, e.g., software development and maintenance. It remains challenging to represent code snippets and generate more accurate descriptions to summarize the functionality and semantics of programs. Existing methods of the code summarization task have been devised to tackle real-world problems and have been successfully proven effective. However, there is little attention to its application in novel programming languages where only a few well-documented programs in these low-resource languages are available for training. According to our observation, existing approaches can only acquire poor performances in such settings, and we attribute the problem to data-hungry and programming language gaps. Enlightened by recent pre-training methods, we propose MetaSum, a meta-learning-based code summarization model, to extract prior and shared knowledge from high-resource programming languages where high-quality code snippets are easily accessible and then adapt it to low-resource settings. The critical contribution of this dissertation is that we (1) give a comprehensive illustration of the development of machine-learning-based code summarization task, (2) identify a new problem of low-resource code summarization and propose a meta-learning-based model to improve over other methods by 3.18 and 1.79 BLEU points over state-of-the-art pre-trained models on Nix and Ruby datasets, respectively, and (3) introduce a machine-learning-based toolkit, NaturalCC, for fair comparison of models for the automated software engineering community.
Please use this identifier to cite or link to this item: