Approximate Repeating Pattern Mining with Gap Requirements

Publisher:
IEEE Computer Society
Publication Type:
Conference Proceeding
Citation:
Proc. of the 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI-09), 2009, pp. 17 - 24
Issue Date:
2009-01
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2009001670OK.pdf442.58 kB
Adobe PDF
In this paper, we define a new research problem for mining approximate repeating patterns (ARP) with gap constraints, where the appearance of a pattern is subject to an approximate matching, which is very common in biological sciences. To solve the problem, we propose an ArpGap (Approximate repeating pattern mining with Gap constraints) algorithm with three major components for approximate repeating pattern mining: (1) a data-driven pattern generation approach to avoid generating unnecessary patterns; (2) a back-tracking pattern search process to discover approximate occurrences of a pattern under gap constraints; and (3) an Apriori-like deterministic pruning approach to progressively prune patterns and cease the search process if necessary. Experimental results on synthetic and real-world protein sequences assert that ArpGap is efficient in terms of memory consumption and computational cost.
Please use this identifier to cite or link to this item: