Static Analysis-Guided Automatic Source Code Summarization via Deep Learning
- Publication Type:
- Thesis
- Issue Date:
- 2021
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Code summarization provides high level natural language description of the function, which can benefit the software maintenance, code categorization and retrieval. To the best of our knowledge, the existing research can be mainly categorized as template-based approaches, information-retrieval-based approaches and deep-learning-based approaches. Recently, with the development of deep learning and its widely utilization, neural machine translation (NMT) structure has been introduced to the research of code summarization. Based on our study, most state-of-the-art deep-learning-based approaches follow an encoder-decoder framework which encodes the code into hidden space and then decode it into natural language space. However, due to the special grammar and syntax structure of programming languages and various shortcomings of different deep neural networks, the accuracy of existing code summarization approaches is not high enough. These approaches mainly suffering from three major drawbacks: a) They consider the sequential content of code, ignoring the structure which is also critical for the comprehension of code; b) They only consider the generation of the code's intent, while ignore the information of parameters etc which is also quite important for the understanding and usage of the source code. c) Their adopted CNN/RNN model usually cause long-distance dependency and excessive computation cost problem. Considering this status, the main research work of this thesis are as follows: (1) the first work presents a code summarization approach using hierarchical attention network by incorporating multiple code features, which are injected into a deep reinforcement learning (DRL) framework (e.g., actor-critic network) for comment generation. (2) While many existing approaches exploit inadequate power of statement-wise semantic contributions for augmenting their performance, the second work propose the transformer-based generative adversarial network framework for universal code summarization which constructs a cross-language universal hierarchical semantic (UHS) model to classify statements by positioning them in source code. (3) Consider that almost all approaches only consider to generate the general intent of the method without documenting their parameters, the third work proposes to generate both the method comment and the parameter comment to provide complete java documentation for the code snippets. Specifically, it designs a programming-analysis-based component to extract UseSet of parameter and the KeySet in the code snippet to obtain the main semantic information and discard the useless noise information and utilizes the copy-attention-integrated transformer based NMT framework. Through the completion of this thesis, a set of experimental studies are conducted, where the experimental results suggest that our proposed approaches outperform multiple state-of-the-art approaches.
Please use this identifier to cite or link to this item:
