scholarly journals Dynamic Graph based Method for Mining Text Data

An improved graph based association rules mining (ARM) approach to extract association rules fromtext databases is proposed in this paper. The text document in the proposed technique is read only once to lookfor the terms whose occurrences are greater than some threshold value, these terms are stored in a file with theirfrequencies, then they are represented as nodes in a weighted directed graph where edges represent relationsbetween these terms, the edges will denote the associations between terms while the edges' weights denote thestrength or confidence of these rules. The proposed method is called Dynamic Graph based Rule Mining fromText (DGRMT) because the graph is built level by level according the length of a sentence (number of frequentterms). Weighted subgraph mining is used to ensure the efficiency and throughput of the proposed technique;only the most frequent subgraphs are extracted. The proposed technique is validated and evaluated using realworld textual data sets and compared with one of the best graph based rule mining technique, which is algorithmfor Generating Association Rules based on Weighting scheme(GARW). The results determine that the proposed approach is better than GARW on almost all textual datasets.

Author(s):  
M. Yu. Bolgov ◽  
I. R. Yanchij ◽  
Yu. N. Taraschenko ◽  
N. Ya. Kobrynskaya ◽  
А. М. Lygina

To work effectively with text data of instrumental examinations the original menu designer proposed. For the analysis of textual data of instrumental examinations, a mechanism of automatic alignment of nominations offered. Using the proposed approach can significantly reduce the time and improve physician performance and analysis of large data sets.


Author(s):  
Maybin Muyeba ◽  
M. Sulaiman Khan ◽  
Frans Coenen

A novel approach is presented for effectively mining weighted fuzzy association rules (ARs). The authors address the issue of invalidation of downward closure property (DCP) in weighted association rule mining where each item is assigned a weight according to its significance wrt some user defined criteria. Most works on weighted association rule mining do not address the downward closure property while some make assumptions to validate the property. This chapter generalizes the weighted association rule mining problem with binary and fuzzy attributes with weighted settings. Their methodology follows an Apriori approach but employs T-tree data structure to improve efficiency of counting itemsets. The authors’ approach avoids pre and post processing as opposed to most weighted association rule mining algorithms, thus eliminating the extra steps during rules generation. The chapter presents experimental results on both synthetic and real-data sets and a discussion on evaluating the proposed approach.


2008 ◽  
Vol 17 (06) ◽  
pp. 1109-1129 ◽  
Author(s):  
BASILIS BOUTSINAS ◽  
COSTAS SIOTOS ◽  
ANTONIS GEROLIMATOS

One of the most important data mining problems is learning association rules of the form "90% of the customers that purchase product x also purchase product y". Discovering association rules from huge volumes of data requires substantial processing power. In this paper we present an efficient distributed algorithm for mining association rules that reduces the time complexity in a magnitude that renders as suitable for scaling up to very large data sets. The proposed algorithm is based on partitioning the initial data set into subsets and processing each subset in parallel. The proposed algorithm can maintain the set of association rules that are extracted when applying an association rule mining algorithm to all the data, by reducing the support threshold during processing the subsets. The above are confirmed by empirical tests that we present and which also demonstrate the utility of the method.


With the growth of today’s world, text data is also increasing which are created by different media like social networking sites, web, and other informatics and sources e.t.c . Clustering is an important part of the data mining. Clustering is the procedure of cleave the large &similar type of text into the same group. Clustering is generally used in many applications like medical, biology, signal processing, etc. Algorithm contains traditional clustering like hierarchal clustering, density based clustering and self-organized map clustering. By using kmeans features and dbscan we can able to cluster the document. dbscan a part of clustering shows to a number of standard. The data sets will automatically evaluate the formulation of each and every part data through by the use of dbscan and k-means that will shows the clustering power of the data. document consists of multiple topic. Document clustering demands the context of signifier and form ancestry. Descriptors are the expression used to describe the satisfied inside the cluster.


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Xiaoyan Liu ◽  
Feng Feng ◽  
Qian Wang ◽  
Ronald R. Yager ◽  
Hamido Fujita ◽  
...  

Traditional association rule extraction may run into some difficulties due to ignoring the temporal aspect of the collected data. Particularly, it happens in many cases that some item sets are frequent during specific time periods, although they are not frequent in the whole data set. In this study, we make an effort to enhance conventional rule mining by introducing temporal soft sets. We define temporal granulation mappings to induce granular structures for temporal transaction data. Using this notion, we define temporal soft sets and their Q -clip soft sets to establish a novel framework for mining temporal association rules. A number of useful characterizations and results are obtained, including a necessary and sufficient condition for fast identification of strong temporal association rules. By combining temporal soft sets with NegNodeset-based frequent item set mining techniques, we develop the negFIN-based soft temporal association rule mining (negFIN-STARM) method to extract strong temporal association rules. Numerical experiments are conducted on commonly used data sets to show the feasibility of our approach. Moreover, comparative analysis demonstrates that the newly proposed method achieves higher execution efficiency than three well-known approaches in the literature.


2021 ◽  
Vol 5 (2(61)) ◽  
pp. 6-8
Author(s):  
Olena Hryshchenko ◽  
Vadym Yaremenko

The object of research is the methods of fast classification for solving text data classification problems. The need for this study is due to the rapid growth of textual data, both in digital and printed forms. Thus, there is a need to process such data using software, since human resources are not able to process such an amount of data in full. A large number of data classification approaches have been developed. The conducted research is based on the application of the following methods of classification of text data: Bloom filter, naive Bayesian classifier and neural networks to a set of text data in order to classify them into categories. Each method has both disadvantages and advantages. This paper will reflect the strengths and weaknesses of each method on a specific example. These algorithms were comparatively among themselves in terms of speed and efficiency, that is, the accuracy of determining the belonging of a text to a certain class of classification. The work of each method was considered on the same data sets with a change in the amount of training and test data, as well as with a change in the number of classification groups. The dataset used contains the following classes: world, business, sports, and science and technology. In real conditions of the classification of such data, the number of categories is much larger than that considered in the work, and may have subcategories in its composition. In the course of this study, each method was analyzed using different parameter values to obtain the best result. Analyzing the results obtained, the best results for the classification of text data were obtained using a neural network.


Author(s):  
Emad Alsukhni ◽  
Ahmed AlEroud ◽  
Ahmad A. Saifan

Association rule mining is a very useful knowledge discovery technique to identify co-occurrence patterns in transactional data sets. In this article, the authors proposed an ontology-based framework to discover multi-dimensional association rules at different levels of a given ontology on user defined pre-processing constraints which may be identified using, 1) a hierarchy discovered in datasets; 2) the dimensions of those datasets; or 3) the features of each dimension. The proposed framework has post-processing constraints to drill down or roll up based on the rule level, making it possible to check the validity of the discovered rules in terms of support and confidence rule validity measures without re-applying association rule mining algorithms. The authors conducted several preliminary experiments to test the framework using the Titanic dataset by identifying the association rules after pre- and post-constraints are applied. The results have shown that the framework can be practically applied for rule pruning and discovering novel association rules.


2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Zhicong Kou ◽  
Lifeng Xi

An effective data mining method to automatically extract association rules between manufacturing capabilities and product features from the available historical data is essential for an efficient and cost-effective product development and production. This paper proposes a new binary particle swarm optimization- (BPSO-) based association rule mining (BPSO-ARM) method for discovering the hidden relationships between machine capabilities and product features. In particular, BPSO-ARM does not need to predefine thresholds of minimum support and confidence, which improves its applicability in real-world industrial cases. Moreover, a novel overlapping measure indication is further proposed to eliminate those lower quality rules to further improve the applicability of BPSO-ARM. The effectiveness of BPSO-ARM is demonstrated on a benchmark case and an industrial case about the automotive part manufacturing. The performance comparison indicates that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM. The experimental results indicate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features. This will help support planners and engineers for the new product design and manufacturing.


2015 ◽  
Vol 2015 ◽  
pp. 1-14
Author(s):  
Mengling Zhao ◽  
Hongwei Liu

As a computational intelligence method, artificial immune network (AIN) algorithm has been widely applied to pattern recognition and data classification. In the existing artificial immune network algorithms, the calculating affinity for classifying is based on calculating a certain distance, which may lead to some unsatisfactory results in dealing with data with nominal attributes. To overcome the shortcoming, the association rules are introduced into AIN algorithm, and we propose a new classification algorithm an associate rules mining algorithm based on artificial immune network (ARM-AIN). The new method uses the association rules to represent immune cells and mine the best association rules rather than searching optimal clustering centers. The proposed algorithm has been extensively compared with artificial immune network classification (AINC) algorithm, artificial immune network classification algorithm based on self-adaptive PSO (SPSO-AINC), and PSO-AINC over several large-scale data sets, target recognition of remote sensing image, and segmentation of three different SAR images. The result of experiment indicates the superiority of ARM-AIN in classification accuracy and running time.


Sign in / Sign up

Export Citation Format

Share Document