A Partial-Repeatability Approach to Data Mining

Publisher:
IEEE Computer Society
Publication Type:
Journal Article
Citation:
The IEEE Intelligent Informatics Bulletin, 2005, 5 (1), pp. 26 - 34
Issue Date:
2005-01
Filename Description Size
Thumbnail2009005041OK.pdf372.54 kB
Adobe PDF
Full metadata record
Unlike the data approached in traditional data mining activities, software data are featured with partial-repeatability or parepeatics, which is an invariant property that can neither be proved in mathematics nor validated to a high accuracy in physics, but still (partially) governs the behavior of the data. Parepeatics emerges as a result of the inaccurate universe. The universe comprises all possible C language programs is an example that cannot be accurately characterized since human writes defect-prone programs. In this paper we design a parepeatic mining framework for software data diming, where the mined knowledge is represented in terms of parepeatic models. A parepeatic model consists of central knowledge, a knowledge fluctuation zone and a correctness factor. Our approach can generate the required parepeatic model as a new form of knowledge representation from a given dataset and apply it to software data mining. Experimental results with real C language programs show that the proposed approach is effective
Please use this identifier to cite or link to this item: