high utility patterns
Recently Published Documents


TOTAL DOCUMENTS

50
(FIVE YEARS 20)

H-INDEX

8
(FIVE YEARS 2)

Author(s):  
Orabe Almanaseer

The huge volume of text documents available on the internet has made it difficult to find valuable information for specific users. In fact, the need for efficient applications to extract interested knowledge from textual documents is vitally important. This paper addresses the problem of responding to user queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a cluster-based information retrieval framework was proposed in this paper, in order to design and develop a system for analysing and extracting useful patterns from text documents. In this approach, a preprocessing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector Space Model (VSM) is performed to represent the dataset. The system was implemented through two main phases. In phase 1, the clustering analysis process is designed and implemented to group documents into several clusters, while in phase 2, an information retrieval process was implemented to rank clusters according to the user queries in order to retrieve the relevant documents from specific clusters deemed relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision (P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.


2021 ◽  
Vol 135 ◽  
pp. 101924
Author(s):  
S. Mohammad Mirbagheri ◽  
Howard J. Hamilton

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-18
Author(s):  
Rashad Saeed ◽  
Azhar Rauf ◽  
Fahmi H. Quradaa ◽  
Syed Muhammad Asim

High Utility Itemset Mining (HUIM) is one of the most investigated tasks of data mining. It has broad applications in domains such as product recommendation, market basket analysis, e-learning, text mining, bioinformatics, and web click stream analysis. Insights from such pattern analysis provide numerous benefits, including cost cutting, improved competitive advantage, and increased revenue. However, HUIM methods may discover misleading patterns as they do not evaluate the correlation of extracted patterns. As a consequence, a number of algorithms have been proposed to mine correlated HUIs. These algorithms still suffer from the issue of the computational cost in terms of both time and memory consumption. This paper presents an algorithm, named Efficient Correlated High Utility Pattern Mining (ECoHUPM), to efficiently mine the high utility patterns having strong correlation items. A new data structure based on utility tree (UTtree) named CoUTlist is proposed to store sufficient information for mining the desired patterns. Three pruning properties are introduced to reduce the search space and improve the mining performance. Experiments on sparse, very sparse, dense, and very dense datasets indicate that the proposed ECoHUPM algorithm is efficient as compared to the state-of-the-art CoHUIM and CoHUI-Miner algorithms in terms of both time and memory consumption.


2021 ◽  
pp. 1-26
Author(s):  
Haodong Cheng ◽  
Meng Han ◽  
Ni Zhang ◽  
Xiaojuan Li ◽  
Le Wang

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.


2021 ◽  
Vol 169 ◽  
pp. 114464
Author(s):  
Nhan Vuong ◽  
Bac Le ◽  
Tin Truong ◽  
Duy-Phuong Nguyen

2021 ◽  
Vol 12 (2) ◽  
pp. 1-27
Author(s):  
Yoonji Baek ◽  
Unil Yun ◽  
Heonho Kim ◽  
Hyoju Nam ◽  
Hyunsoo Kim ◽  
...  

Databases that deal with the real world have various characteristics. New data is continuously inserted over time without limiting the length of the database, and a variety of information about the items constituting the database is contained. Recently generated data has a greater influence than the previously generated data. These are called the time-sensitive non-binary stream databases, and they include databases such as web-server click data, market sales data, data from sensor networks, and network traffic measurement. Many high utility pattern mining and stream pattern mining methods have been proposed so far. However, they have a limitation that they are not suitable to analyze these databases, because they find valid patterns by analyzing a database with only some of the features described above. Therefore, knowledge-based software about how to find meaningful information efficiently by analyzing databases with these characteristics is required. In this article, we propose an intelligent information system that calculates the influence of the insertion time of each batch in a large-scale stream database by applying the sliding window model and mines recent high utility patterns without generating candidate patterns. In addition, a novel list-based data structure is suggested for a fast and efficient management of the time-sensitive stream databases. Moreover, our technique is compared with state-of-the-art algorithms through various experiments using real datasets and synthetic datasets. The experimental results show that our approach outperforms the previously proposed methods in terms of runtime, memory usage, and scalability.


Author(s):  
Jimmy Ming-Tai Wu ◽  
Gautam Srivastava ◽  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Min Wei ◽  
...  

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Mohammed A. Fouad ◽  
Wedad Hussein ◽  
Sherine Rady ◽  
Philip S. Yu ◽  
Tarek F. Gharib

2021 ◽  
pp. 205-216
Author(s):  
Jimmy Ming-Tai Wu ◽  
Gautam Srivastava ◽  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Min Wei ◽  
...  

2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Le Wang ◽  
Shui Wang ◽  
Haiyan Li ◽  
Chunliang Zhou

High-utility pattern mining is a research hotspot in the field of pattern mining, and one of its main research topics is how to improve the efficiency of the mining algorithm. Based on the study on the state-of-the-art high-utility pattern mining algorithms, this paper proposes an improved strategy that removes noncandidate items from the global header table and local header table as early as possible, thus reducing search space and improving efficiency of the algorithm. The proposed strategy is applied to the algorithm EFIM (EFficient high-utility Itemset Mining). Experimental verification was carried out on nine typical datasets (including two large datasets); results show that our strategy can effectively improve temporal efficiency for mining high-utility patterns.


Sign in / Sign up

Export Citation Format

Share Document