HINDBR: Heterogeneous information network based duplicate bug report prediction

Xiao, G; Du, X; Sui, Y; Yue, T

HINDBR: Heterogeneous information network based duplicate bug report prediction

Xiao, G Du, X Sui, Y

Yue, T

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: Proceedings - International Symposium on Software Reliability Engineering, ISSRE, 2020, 2020-October, pp. 195-206
Issue Date:: 2020-10-01

Closed Access

	Filename	Description	Size
	issre20b.pdf	Accepted version	966.68 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xiao, G
dc.contributor.author	Du, X
dc.contributor.author	Sui, Y https://orcid.org/0000-0002-9510-6574
dc.contributor.author	Yue, T
dc.date	2020-10-12
dc.date.accessioned	2021-01-27T23:45:11Z
dc.date.available	2021-01-27T23:45:11Z
dc.date.issued	2020-10-01
dc.identifier.citation	Proceedings - International Symposium on Software Reliability Engineering, ISSRE, 2020, 2020-October, pp. 195-206
dc.identifier.isbn	9781728198705
dc.identifier.issn	1071-9458
dc.identifier.uri	http://hdl.handle.net/10453/145614
dc.description.abstract	©2020 IEEE. Duplicate bug reports often exist in bug tracking systems (BTSs). Almost all the existing approaches for automatically detecting duplicate bug reports are based on text similarity. A recent study found that such approaches may become ineffective in detecting duplicates in bug reports submitted after the justin- time (JIT) retrieval, which is now a built-in feature of modern BTSs (e.g., Bugzilla). This is mainly because the embedded JIT feature suggests possible duplicates in a bug database when a bug reporter types in the new summary field, therefore minimizing the submission of textually similar reports. Although JIT filtering seems effective, a number of bug report duplicates remain undetected. Our hypothesis is that we can detect them using a semantic similarity-based approach. This paper presents HINDBR, a novel deep neural network (DNN) that accurately detects semantically similar duplicate bug reports using a heterogeneous information network (HIN). Instead of matching text similarity alone, HINDBR embeds semantic relations of bug reports into a low-dimensional embedding space where two duplicate bug reports represented by two vectors are close to each other in the latent space. Results show that HINDBR is effective.
dc.language	en
dc.publisher	IEEE
dc.relation	http://purl.org/au-research/grants/arc/DP200101328
dc.relation.ispartof	Proceedings - International Symposium on Software Reliability Engineering, ISSRE
dc.relation.ispartof	2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)
dc.relation.isbasedon	10.1109/ISSRE5003.2020.00027
dc.rights	info:eu-repo/semantics/closedAccess
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.title	HINDBR: Heterogeneous information network based duplicate bug report prediction
dc.type	Conference Proceeding
utslib.citation.volume	2020-October
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
dc.date.updated	2021-01-27T23:44:27Z
pubs.finish-date	2020-10-15
pubs.publication-status	Published
pubs.start-date	2020-10-12
pubs.volume	2020-October

Abstract:

©2020 IEEE. Duplicate bug reports often exist in bug tracking systems (BTSs). Almost all the existing approaches for automatically detecting duplicate bug reports are based on text similarity. A recent study found that such approaches may become ineffective in detecting duplicates in bug reports submitted after the justin- time (JIT) retrieval, which is now a built-in feature of modern BTSs (e.g., Bugzilla). This is mainly because the embedded JIT feature suggests possible duplicates in a bug database when a bug reporter types in the new summary field, therefore minimizing the submission of textually similar reports. Although JIT filtering seems effective, a number of bug report duplicates remain undetected. Our hypothesis is that we can detect them using a semantic similarity-based approach. This paper presents HINDBR, a novel deep neural network (DNN) that accurately detects semantically similar duplicate bug reports using a heterogeneous information network (HIN). Instead of matching text similarity alone, HINDBR embeds semantic relations of bug reports into a low-dimensional embedding space where two duplicate bug reports represented by two vectors are close to each other in the latent space. Results show that HINDBR is effective.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/145614