Discovering conditional matching rules

Publication Type:
Journal Article
Citation:
ACM Transactions on Knowledge Discovery from Data, 2017, 11 (4)
Issue Date:
2017-06-01
Filename Description Size
a46-wang.pdfPublished Version2.3 MB
Adobe PDF
Full metadata record
Matching dependencies (MDS) have recently been proposed to make data dependencies tolerant to various information representations, and found useful in data quality applications such as record matching. Instead of the strict equality function used in traditional dependency syntax (e.g., functional dependencies), MDS specify constraints based on similarity and identification.However, in practice, MDS may still be too strict and applicable only in a subset of tuples in a relation. Thereby, we study the conditional matching dependencies (CMDS), which bindmatching dependencies only in a certain part of a table, i.e., MDS conditionally applicable in a subset of tuples. Compared to MDS, CMDS have more expressive power that enables them to satisfy wider application needs. In this article, we study several important theoretical and practical issues of CMDS, including irreducible CMDS with respect to the implication, discovery of CMDS from data, reliable CMDS agreed most by a relation, approximate CMDS almost satisfied in a relation, and finally applications of CMDS in record matching and missing value repairing. Through an extensive experimental evaluation in real data sets, we demonstrate the efficiency of proposed CMDS discovery algorithms and effectiveness of CMDS in real applications.
Please use this identifier to cite or link to this item: