Document similarity analysis via involving both explicit and implicit semantic couplings
- Publication Type:
- Conference Proceeding
- Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, 2015
- Issue Date:
© 2015 IEEE. Document similarity analysis is increasingly critical since roughly 80% of big data is unstructured. Accordingly, semantic couplings (relatedness) have been recognized valuable for capturing the relationships between terms (words or phrases). Existing work focuses more on explicit relatedness, with respective models built. In this paper, we propose a comprehensive semantic similarity measure: Semantic Coupling Similarity (SCS), which (1) captures intra-term pair couplings within term pairs represented by patterns of explicit term co-occurrences in a document set, (2) extracts inter-term pair couplings between term pairs indicated by implicit couplings between term pairs through indirectly linked terms and paths between terms after term connections are converted to a graph presentation; and (3) semantic coupling similarity, integrating intra- and inter-term pair couplings towards a comprehensive capturing of explicit and implicit couplings between terms across documents. SCS caters for both synonymy and polysemy, and outperforms baseline methods consistently on all real data sets.
Please use this identifier to cite or link to this item: