HINDBR: Heterogeneous information network based duplicate bug report prediction

Publication Type:
Conference Proceeding
Proceedings - International Symposium on Software Reliability Engineering, ISSRE, 2020, 2020-October, pp. 195-206
Issue Date:
Filename Description Size
issre20b.pdfAccepted version966.68 kB
Adobe PDF
Full metadata record
©2020 IEEE. Duplicate bug reports often exist in bug tracking systems (BTSs). Almost all the existing approaches for automatically detecting duplicate bug reports are based on text similarity. A recent study found that such approaches may become ineffective in detecting duplicates in bug reports submitted after the justin- time (JIT) retrieval, which is now a built-in feature of modern BTSs (e.g., Bugzilla). This is mainly because the embedded JIT feature suggests possible duplicates in a bug database when a bug reporter types in the new summary field, therefore minimizing the submission of textually similar reports. Although JIT filtering seems effective, a number of bug report duplicates remain undetected. Our hypothesis is that we can detect them using a semantic similarity-based approach. This paper presents HINDBR, a novel deep neural network (DNN) that accurately detects semantically similar duplicate bug reports using a heterogeneous information network (HIN). Instead of matching text similarity alone, HINDBR embeds semantic relations of bug reports into a low-dimensional embedding space where two duplicate bug reports represented by two vectors are close to each other in the latent space. Results show that HINDBR is effective.
Please use this identifier to cite or link to this item: