Please use this identifier to cite or link to this item:
|Title:||Mining statistically important equivalence classes and delta-discriminative emerging patterns|
Itemsets with ranked statistical merit
|Source:||Li, J.,Liu, G.,Wong, L. (2007). Mining statistically important equivalence classes and delta-discriminative emerging patterns. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining : 430-439. ScholarBank@NUS Repository. https://doi.org/10.1145/1281192.1281240|
|Abstract:||The support-confidence framework is the most common measure used in itemset mining algorithms, for its antimonotonicity that effectively simplifies the search lattice. This computational convenience brings both quality and statistical flaws to the results as observed by many previous studies. In this paper, we introduce a novel algorithm that produces itemsets with ranked statistical merits under sophisticated test statistics such as chi-square, risk ratio, odds ratio, etc. Our algorithm is based on the concept of equivalence classes. An equivalence class is a set of frequent itemsets that always occur together in the same set of transactions. Therefore, itemsets within an equivalence class all share the same level of statistical significance regardless of the variety of test statistics. As an equivalence class can be uniquely determined and concisely represented by a closed pattern and a set of generators, we just mine closed patterns and generators, taking a simultaneous depth-first search scheme. This parallel approach has not been exploited by any prior work. We evaluate our algorithm on two aspects. In general, we compare to LCM and FPclose which are the best algorithms tailored for mining only closed patterns. In particular, we compare to epMiner which is the most recent algorithm for mining a type of relative risk patterns, known as minimal emerging patterns. Experimental results show that our algorithm is faster than all of them, sometimes even multiple orders of magnitude faster. These statistically ranked patterns and the efficiency have a high potential for real-life applications, especially in biomedical and financial fields where classical test statistics are of dominant interest. © 2007 ACM.|
|Source Title:||Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|
|Appears in Collections:||Staff Publications|
Show full item record
Files in This Item:
There are no files associated with this item.
checked on Dec 13, 2017
checked on Dec 9, 2017
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.