Equivalence Class Based Parallel Algorithm for Mining MFI
We present a novel and powerful parallel algorithm, PMFI, for mining all the maximal frequent itemsets from a big database. PMFI utilizes novel technologies to make the I/O overhead down drastically. The key principle is to utilize prefix-based equivalence classes to decompose the search space. It distributes the work among the processors by equivalence class weights. It re-represents the database with vertical format, so the frequency counting can be done by simple tid-list intersection operations. It bases a novel serial algorithm MaxMining which utilizes multiple-level backtrack pruning strategy, so that each processor can count the maximal frequent itemsets independently by selectively duplicating the pieces of database. These techniques eliminate the need for synchronization. The dynamic load balance schema is applied in PMFI, it would be hopeful to achieve better performance.