Enhancing Traceability Link Recovery with Unlabeled Data

Zhu, J; Xiao, G; Zheng, Z; Sui, Y

Enhancing Traceability Link Recovery with Unlabeled Data

Zhu, J Xiao, G Zheng, Z Sui, Y

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Conference Proceeding
Citation:: Proceedings - International Symposium on Software Reliability Engineering, ISSRE, 2022, 2022-October, pp. 446-457
Issue Date:: 2022-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 31 Dec 2024

Download Accepted versionAdobe PDF (976.96 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhu, J
dc.contributor.author	Xiao, G
dc.contributor.author	Zheng, Z
dc.contributor.author	Sui, Y https://orcid.org/0000-0002-9510-6574
dc.date	2022-10-31
dc.date.accessioned	2023-03-24T03:42:25Z
dc.date.available	2023-03-24T03:42:25Z
dc.date.issued	2022-01-01
dc.identifier.citation	Proceedings - International Symposium on Software Reliability Engineering, ISSRE, 2022, 2022-October, pp. 446-457
dc.identifier.isbn	9781665451321
dc.identifier.issn	1071-9458
dc.identifier.uri	http://hdl.handle.net/10453/168316
dc.description.abstract	Traceability link recovery (TLR) is an important software engineering task for developing trustworthy and reliable software systems. Recently proposed deep learning (DL) models have shown their effectiveness compared to traditional information retrieval-based methods. DL often heavily relies on sufficient labeled data to train the model. However, manually labeling traceability links is time-consuming, labor-intensive, and requires specific knowledge from domain experts. As a result, typically only a small portion of labeled data is accompanied by a large amount of unlabeled data in real-world projects. Our hypothesis is that artifacts are semantically similar if they have the same linked artifact(s). This paper presents TRACEFUN, a new approach to enhance traceability link recovery with unlabeled data. TRACEFUN first measures the similarities between unlabeled and labeled artifacts using two similarity prediction methods (i.e., vector space model and contrastive learning). Then, based on the similarities, newly labeled links are generated between the unlabeled artifacts and the linked objects of the labeled artifacts. Generated links are further used for TLR model training. We have evaluated TRACEFUN on three GitHub projects with two state-of-the-art DL models (i.e., Trace BERT and TraceNN). The results show that TRACEFUN is effective in terms of a maximum improvement of F1-score up to 21% and 1,088%, respectively for Trace BERT and TraceNN.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation	http://purl.org/au-research/grants/arc/DP210101348
dc.relation.ispartof	Proceedings - International Symposium on Software Reliability Engineering, ISSRE
dc.relation.ispartof	2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)
dc.relation.isbasedon	10.1109/ISSRE55969.2022.00050
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.title	Enhancing Traceability Link Recovery with Unlabeled Data
dc.type	Conference Proceeding
utslib.citation.volume	2022-October
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2024-12-31T00:00:00+1000Z
dc.date.updated	2023-03-24T03:42:24Z
pubs.finish-date	2022-11-03
pubs.publication-status	Published
pubs.start-date	2022-10-31
pubs.volume	2022-October

Abstract:

Traceability link recovery (TLR) is an important software engineering task for developing trustworthy and reliable software systems. Recently proposed deep learning (DL) models have shown their effectiveness compared to traditional information retrieval-based methods. DL often heavily relies on sufficient labeled data to train the model. However, manually labeling traceability links is time-consuming, labor-intensive, and requires specific knowledge from domain experts. As a result, typically only a small portion of labeled data is accompanied by a large amount of unlabeled data in real-world projects. Our hypothesis is that artifacts are semantically similar if they have the same linked artifact(s). This paper presents TRACEFUN, a new approach to enhance traceability link recovery with unlabeled data. TRACEFUN first measures the similarities between unlabeled and labeled artifacts using two similarity prediction methods (i.e., vector space model and contrastive learning). Then, based on the similarities, newly labeled links are generated between the unlabeled artifacts and the linked objects of the labeled artifacts. Generated links are further used for TLR model training. We have evaluated TRACEFUN on three GitHub projects with two state-of-the-art DL models (i.e., Trace BERT and TraceNN). The results show that TRACEFUN is effective in terms of a maximum improvement of F1-score up to 21% and 1,088%, respectively for Trace BERT and TraceNN.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/168316