A confidence-based entity resolution approach with incomplete information

Publisher:
Institute of Electrical and Electronics Engineers Inc.
Publication Type:
Conference Proceeding
Citation:
DSAA 2014 - Proceedings of the 2014 IEEE International Conference on Data Science and Advanced Analytics, 2014, pp. 97 - 103
Issue Date:
2014
Full metadata record
Files in This Item:
Filename Description Size
ThumbnailDSAA14_07058058.pdf Published version769.82 kB
Adobe PDF
Entity resolution identifies entities from different data sources that refer to the same real-world entity and it is an important prerequisite for integrating data from multiple sources. Entity resolution mainly relies on similarity measures on data records. Unfortunately, the data quality of data sources is not so good in practice. Especially web data sources often only provide incomplete information, which leads to the difficulties of direct applying similarity measures to identify the same entities. In order to address this problem, the concept of confidence is introduced to measure the trustworthy of the similarity calculation. An adaptive rule-based approach is used to calculate the similarity between records and its confidence is also derived. Then the similarity and confidence are propagated on the entity relational graph until fix point is reached. Finally, any pair of two records can be determined as matched or unmatched based on a threshold. We performed a series of experiments on real data sets and experiment results show that our approach has a better performance comparing with others.
Please use this identifier to cite or link to this item: