Mining frequent patterns in print logs with semantically alternative labels

Publication Type:
Conference Proceeding
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, 8347 LNAI (PART 2), pp. 107 - 119
Issue Date:
Filename Description Size
Thumbnail2013002745OK.pdf340.47 kB
Adobe PDF
Full metadata record
It is common today for users to print the informative information from webpages due to the popularity of printers and internet. Thus, many web printing tools such as Smart Print and PrintUI are developed for online printing. In order to improve the users' printing experience, the interaction data between users and these tools are collected to form a so-called print log data, where each record is the set of urls selected for printing by a user within a certain period of time. Apparently, mining frequent patterns from these print log data can capture user intentions for other applications, such as printing recommendation and behavior targeting. However, mining frequent patterns by directly using url as item representation in print log data faces two challenges: data sparsity and pattern interpretability. To tackle these challenges, we attempt to leverage delicious api (a social bookmarking web service) as an external thesaurus to expand the semantics of each url by selecting tags associated with the domain of each url. In this setting, the frequent pattern mining is employed on the tag representation of each url rather than the url or domain representation. With the enhancement of semantically alternative tag representation, the semantics of url is substantially improved, thus yielding the useful frequent patterns. To this end, in this paper we propose a novel pattern mining problem, namely mining frequent patterns with semantically alternative labels, and propose an efficient algorithm named PaSAL (Frequent Patterns with Semantically Alternative Labels Mining Algorithm) for this problem. Specifically, we propose a new constraint named conflict matrix to purify the redundant patterns to achieve a high efficiency. Finally, we evaluate the proposed algorithm on a real print log data. © 2013 Springer-Verlag.
Please use this identifier to cite or link to this item: