Exploring the genetic basis of diseases through a heterogeneous bibliometric network: A methodology and case study

Publication Type:
Journal Article
Technological Forecasting and Social Change, 2021, 164
Issue Date:
Filename Description Size
1-s2.0-S0040162520313391-main.pdfPublished version2.15 MB
Adobe PDF
Full metadata record
Literature-based knowledge (LBD) discovery is a practical approach to inferring the associations between diseases and genetic factors from unstructured biomedical data, i.e., the literature. However, most of the contemporary LBD methods are designed for specific cases and rely heavily on prior knowledge. In this paper, we propose an adaptable and transferable methodology that not only summarizes the genetic factors known to be associated with a queried disease but also predicts likely associations that have yet to be identified. The framework incorporates different biomedical entities in a heterogeneous co-occurrence network. Three centrality indicators, coupled with a novel measure based on intersection ratios, capture the importance and specificity of each factor to the disease under study. Undiscovered, but likely, associations are identified through a semantic similarity matrix generated by our Bioentity2Vec model and an innovative weighted link prediction algorithm. The final outputs are ranked lists of the most relevant known or potential biomedical associations. To both test and showcase the methodology, we conducted a case study on atrial fibrillation. The analysis yields specific insights into the key biomedical entities associated with this disease. Moreover, it demonstrates the kind of valuable decision support this framework can provide to medical researchers, policymakers and public health administrations.
Please use this identifier to cite or link to this item: