scholarly journals Optimization of Associative Knowledge Graph using TF-IDF based Ranking Score

2020 ◽  
Vol 10 (13) ◽  
pp. 4590 ◽  
Author(s):  
Hyun-Jin Kim ◽  
Ji-Won Baek ◽  
Kyungyong Chung

This study proposes the optimization method of the associative knowledge graph using TF-IDF based ranking scores. The proposed method calculates TF-IDF weights in all documents and generates term ranking. Based on the terms with high scores from TF-IDF based ranking, optimized transactions are generated. News data are first collected through crawling and then are converted into a corpus through preprocessing. Unnecessary data are removed through preprocessing including lowercase conversion, removal of punctuation marks and stop words. In the document term matrix, words are extracted and then transactions are generated. In the data cleaning process, the Apriori algorithm is applied to generate association rules and make a knowledge graph. To optimize the generated knowledge graph, the proposed method utilizes TF-IDF based ranking scores to remove terms with low scores and recreate transactions. Based on the result, the association rule algorithm is applied to create an optimized knowledge model. The performance is evaluated in rule generation speed and usefulness of association rules. The association rule generation speed of the proposed method is about 22 seconds faster. And the lift value of the proposed method for usefulness is about 0.43 to 2.51 higher than that of each one of conventional association rule algorithms.

Author(s):  
Asep Budiman Kusdinar ◽  
Daris Riyadi ◽  
Asriyanik Asriyanik

A buffet restaurant is a restaurant that provides buffet food that is served directly at the dining table so that customers can order more food according to their needs. This study uses the association rule method which is one of the methods of data mining and a priori algorithms. Data mining is the process of discovering patterns or rules in data, in which the process must be automatic or semi-automatic. Association rules are one of the techniques of data mining that is used to look for relationships between items in a dataset. While  the apriori algorithm is a very well-known algorithm for finding high-frequency patterns, this a priori algorithm is a type of association rule in data mining. High- frequency patterns are patterns of items in the database that have frequencies or support. This high-frequency pattern is used to develop rules and also some other data mining techniques. The composition of the food menu in the Asgar restaurant is now arranged randomly without being prepared on the food menu between one another. The result of this research is  to support the composition of the food menu at the Asgar restaurant so that it is easier to take food menu with one another.  


Author(s):  
Paul D. McNicholas ◽  
Yanchang Zhao

Association rules present one of the most versatile techniques for the analysis of binary data, with applications in areas as diverse as retail, bioinformatics, and sociology. In this chapter, the origin of association rules is discussed along with the functions by which association rules are traditionally characterised. Following the formal definition of an association rule, these functions – support, confidence and lift – are defined and various methods of rule generation are presented, spanning 15 years of development. There is some discussion about negations and negative association rules and an analogy between association rules and 2×2 tables is outlined. Pruning methods are discussed, followed by an overview of measures of interestingness. Finally, the post-mining stage of the association rule paradigm is put in the context of the preceding stages of the mining process.


2011 ◽  
Vol 179-180 ◽  
pp. 55-59
Author(s):  
Ping Shui Wang

Association rule mining is one of the hottest research areas that investigate the automatic extraction of previously unknown patterns or rules from large amounts of data. Finding association rules can be derived based on mining large frequent candidate sets. Aiming at the poor efficiency of the classical Apriori algorithm which frequently scans the business database, studying the existing association rules mining algorithms, we proposed a new algorithm of association rules mining based on relation matrix. Theoretical analysis and experimental results show that the proposed algorithm is efficient and practical.


2014 ◽  
Vol 536-537 ◽  
pp. 520-523
Author(s):  
Jia Liu ◽  
Zhen Ya Zhang ◽  
Hong Mei Cheng ◽  
Qian Sheng Fang

Usually, non trivial network visiting behaviors implied in network visiting log can be treated as the frequent itemsets or association rules if data in networking log file are transformed into transaction and technologies on association rule can be used to mine those frequent itemsets which are focused by user or some application. To mine non trivial behaviors of network visiting effectively, an attention based frequent itemsets mining method is proposed in this paper. In our proposed method, properties of users focusing is described as attention set and the early selection model of attention as information filter is referenced in the design of our method. Experimental results show that our proposed method is faster than apriori algorithm on the mining of frequent itemsets which is focused by our attention.


Association rule mining techniques are important part of data mining to derive relationship between attributes of large databases. Association related rule mining have evolved huge interest among researchers as many challenging problems can be solved using them. Numerous algorithms have been discovered for deriving association rules effectively. It has been evaluated that not all algorithms can give similar results in all scenarios, so decoding these merits becomes important. In this paper two association rule mining algorithms were analyzed, one is popular Apriori algorithm and the other is EARMGA (Evolutionary Association Rules Mining with Genetic Algorithm). Comparison of these two algorithms were experimentally performed based on different datasets and different parameters like Number of rules generated, Average support, Average Confidence, Covered records were detailed.


2021 ◽  
Vol 2 (1) ◽  
pp. 132-139
Author(s):  
Wiwit Pura Nurmayanti ◽  
Hanipar Mahyulis Sastriana ◽  
Abdul Rahim ◽  
Muhammad Gazali ◽  
Ristu Haiban Hirzi ◽  
...  

Indonesia is an equatorial country that has abundant natural wealth from the seabed to the top of the mountains, the beauty of the country of Indonesia also lies in the mountains that it has in various provinces, for example in the province of West Nusa Tenggara known for its beautiful mountain, namely Rinjani. The increase in outdoor activities has attracted many people to open outdoor shops in the West Nusa Tenggara region. Sales transaction data in outdoor stores can be processed into information that can be profitable for the store itself. Using a market basket analysis method to see the association (rules) between a number of sales attributes. The purpose of this study is to determine the pattern of relationships in the transactions that occur. The data used is the transaction data of outdoor goods. The analysis used is the Association Rules with the Apriori algorithm and the frequent pattern growth (FP-growth) algorithm. The results of this study are formed 10 rules in the Apriori algorithm and 4 rules in the FP-Growth algorithm. The relationship pattern or association rule that is formed is in the item "if a consumer buys a portable stove, it is possible that portable gas will also be purchased" at the strength level of the rules with a minimum support of 0.296 and confidence 0.774 at Apriori and 0.296 and 0.750 at FP-Growth.  


2012 ◽  
Vol 562-564 ◽  
pp. 876-881
Author(s):  
Guan Xun Cui ◽  
Qian Wu ◽  
Bo He ◽  
Wei Ni

Extraction of frequent patterns in transaction-oriented database is crucial to several data mining tasks such as association rule generation, time series analysis, classification, etc. An Efficient Parallel algorithm for Mining frequent pattern (EPM) was proposed and Fast Distributed association rules Mining (FDM) algorithm was improved. Hash table technology was used to improve the generation efficiency of the 2nd candidate items . It also reduces the number of transactions in transaction database using Tid table technology. A master-slave model of parallel algorithm for mining association rules is designed in the algorithm to reduce the communication cost. The experimental results show that this algorithm has a high efficiency to deal with large database.


2014 ◽  
Vol 687-691 ◽  
pp. 1282-1285 ◽  
Author(s):  
Ying Sui

Information security is a matter of concern in any sector and industry, and the vulnerability is the important factor which caused this issue. Therefore it is necessary to analyze and predict the occurrence of vulnerability. This paper used the datas of CNNVD vulnerability database and Apriori algorithm to analyze and predict the occurrence of software vulnerability. In the data preprocessing stage by changing the level of vulnerability rule we can dig out more concept association. In the evaluation stage of association rules by designing filters we can find the rules in line with the degree of user interest. Finally, this papper could demonstrate the effectiveness of of this method by experiments.


Energies ◽  
2021 ◽  
Vol 14 (21) ◽  
pp. 6889
Author(s):  
Yuxin Huang ◽  
Jingdao Fan ◽  
Zhenguo Yan ◽  
Shugang Li ◽  
Yanping Wang

In the process of gas prediction and early warning, outliers in the data series are often discarded. There is also a likelihood of missing key information in the analysis process. To this end, this paper proposes an early warning model of coal face gas multifactor coupling relationship analysis. The model contains the k-means algorithm based on initial cluster center optimization and an Apriori algorithm based on weight optimization. Optimizing the initial cluster center of all data is achieved using the cluster center of the preorder data subset, so as to optimize the k-means algorithm. The optimized algorithm is used to filter out the outliers in the collected data set to obtain the data set of outliers. Then, the Apriori algorithm is optimized so that it can identify more important information that appears less frequently in the events. It is also used to mine and analyze the association rules of abnormal values and obtain interesting association rule events among the gas outliers in different dimensions. Finally, four warning levels of gas risk are set according to different confidence intervals, the truth and reliable warning results are obtained. By mining association rules between abnormal data in different dimensions, the validity and effectiveness of the gas early warning model proposed in this paper are verified. Realizing the classification of early warning of gas risks has important practical significance for improving the safety of coal mines.


Sign in / Sign up

Export Citation Format

Share Document