GA2RM: A GA-Based Action Rule Mining Method

Author(s):  
Shervin Hashemi ◽  
Pirooz Shamsinejad

Action Mining is a subfield of Data Mining that tries to extract actions from traditional data sets. Action Rule is a type of rule that suggests some changes in its consequent part. Extracting action rules from data has been one of the research interests in recent years. Current state-of-the-art action rule mining methods like DEAR typically take classification rules as their input; Since traditional classification methods have been designed for prediction and not for manipulation, therefore extracting action rules directly from data can result in more valuable action rules. Here, we have proposed a method to generate action rules directly from data. To tackle the problem of huge search space of action rules, a Genetic Algorithm has been devised. Different metrics have been defined for investigating the effectiveness of our proposed method and a large number of experiments have been done on real and synthetic data sets. The results show that our method can find from 20% to 10 times more interesting (in case of support and confidence) action rules in comparison with its competitors.

A Data mining is the method of extracting useful information from various repositories such as Relational Database, Transaction database, spatial database, Temporal and Time-series database, Data Warehouses, World Wide Web. Various functionalities of Data mining include Characterization and Discrimination, Classification and prediction, Association Rule Mining, Cluster analysis, Evolutionary analysis. Association Rule mining is one of the most important techniques of Data Mining, that aims at extracting interesting relationships within the data. In this paper we study various Association Rule mining algorithms, also compare them by using synthetic data sets, and we provide the results obtained from the experimental analysis


2021 ◽  
Vol 7 ◽  
pp. e495
Author(s):  
Saleh Albahli ◽  
Hafiz Tayyab Rauf ◽  
Abdulelah Algosaibi ◽  
Valentina Emilia Balas

Artificial intelligence (AI) has played a significant role in image analysis and feature extraction, applied to detect and diagnose a wide range of chest-related diseases. Although several researchers have used current state-of-the-art approaches and have produced impressive chest-related clinical outcomes, specific techniques may not contribute many advantages if one type of disease is detected without the rest being identified. Those who tried to identify multiple chest-related diseases were ineffective due to insufficient data and the available data not being balanced. This research provides a significant contribution to the healthcare industry and the research community by proposing a synthetic data augmentation in three deep Convolutional Neural Networks (CNNs) architectures for the detection of 14 chest-related diseases. The employed models are DenseNet121, InceptionResNetV2, and ResNet152V2; after training and validation, an average ROC-AUC score of 0.80 was obtained competitive as compared to the previous models that were trained for multi-class classification to detect anomalies in x-ray images. This research illustrates how the proposed model practices state-of-the-art deep neural networks to classify 14 chest-related diseases with better accuracy.


Author(s):  
M. Sulaiman Khan ◽  
Maybin Muyeba ◽  
Frans Coenen ◽  
David Reid ◽  
Hissam Tawfik

In this paper, a composite fuzzy association rule mining mechanism (CFARM), directed at identifying patterns in datasets comprised of composite attributes, is described. Composite attributes are defined as attributes that can take simultaneously two or more values that subscribe to a common schema. The objective is to generate fuzzy association rules using “properties” associated with these composite attributes. The exemplar application is the analysis of the nutrients contained in items found in grocery data sets. The paper commences with a review of the back ground and related work, and a formal definition of the CFARM concepts. The CFARM algorithm is then fully described and evaluated using both real and synthetic data sets.


2019 ◽  
Vol 277 ◽  
pp. 01012 ◽  
Author(s):  
Clare E. Matthews ◽  
Paria Yousefi ◽  
Ludmila I. Kuncheva

Many existing methods for video summarisation are not suitable for on-line applications, where computational and memory constraints mean that feature extraction and frame selection must be simple and efficient. Our proposed method uses RGB moments to represent frames, and a control-chart procedure to identify shots from which keyframes are then selected. The new method produces summaries of higher quality than two state-of-the-art on-line video summarisation methods identified as the best among nine such methods in our previous study. The summary quality is measured against an objective ideal for synthetic data sets, and compared to user-generated summaries of real videos.


2009 ◽  
Vol 21 (7) ◽  
pp. 2049-2081 ◽  
Author(s):  
Takashi Takenouchi ◽  
Shin Ishii

In this letter, we present new methods of multiclass classification that combine multiple binary classifiers. Misclassification of each binary classifier is formulated as a bit inversion error with probabilistic models by making an analogy to the context of information transmission theory. Dependence between binary classifiers is incorporated into our model, which makes a decoder a type of Boltzmann machine. We performed experimental studies using a synthetic data set, data sets from the UCI repository, and bioinformatics data sets, and the results show that the proposed methods are superior to the existing multiclass classification methods.


2021 ◽  
pp. 1-35
Author(s):  
Johanna Björklund ◽  
Frank Drewes ◽  
Anna Jonsson

Abstract We show that a previously proposed algorithm for the N-best trees problem can be made more efficient by changing how it arranges and explores the search space. Given an integer N and a weighted tree automaton (wta) M over the tropical semiring, the algorithm computes N trees of minimal weight with respect to M. Compared to the original algorithm, the modifications increase the laziness of the evaluation strategy, which makes the new algorithm asymptotically more efficient than its predecessor. The algorithm is implemented in the software Betty, and compared to the state-of-the-art algorithm for extracting the N best runs, implemented in the software toolkit Tiburon. The data sets used in the experiments are wtas resulting from real-world natural language processing tasks, as well as artificially created wtas with varying degrees of nondeterminism. We find that Betty outperforms Tiburon on all tested data sets with respect to running time, while Tiburon seems to be the more memory-efficient choice.


2015 ◽  
Vol 11 (1) ◽  
pp. 45-65 ◽  
Author(s):  
Heli Sun ◽  
Jianbin Huang ◽  
Xinwei She ◽  
Zhou Yang ◽  
Jiao Liu ◽  
...  

The problem of trip planning with time constraints aims to find the optimal routes satisfying the maximum time requirement and possessing the highest attraction score. In this paper, a more efficient algorithm TripRec is proposed to solve this problem. Based on the principle of the Aprior algorithm for mining frequent item sets, our method constructs candidate attraction sets containing k attractions by using the join rule on valid sets consisting of k-1 attractions. After all the valid routes from the valid k-1 attraction sets have been obtained, all of the candidate routes for the candidate k-sets can be acquired through a route extension approach. This method exhibits manifest improvement of the efficiency in the valid routes generation process. Then, by determining whether there exists at least one valid route, the paper prunes some candidate attraction sets to gain all the valid sets. The process will continue until no more valid attraction sets can be obtained. In addition, several optimization strategies are employed to greatly enhance the performance of the algorithm. Experimental results on both real-world and synthetic data sets show that our algorithm has the better pruning rate and efficiency compared with the state-of-the-art method.


2011 ◽  
Vol 7 (3) ◽  
pp. 1-29 ◽  
Author(s):  
M. Sulaiman Khan ◽  
Maybin Muyeba ◽  
Frans Coenen ◽  
David Reid ◽  
Hissam Tawfik

In this paper, a composite fuzzy association rule mining mechanism (CFARM), directed at identifying patterns in datasets comprised of composite attributes, is described. Composite attributes are defined as attributes that can take simultaneously two or more values that subscribe to a common schema. The objective is to generate fuzzy association rules using “properties” associated with these composite attributes. The exemplar application is the analysis of the nutrients contained in items found in grocery data sets. The paper commences with a review of the back ground and related work, and a formal definition of the CFARM concepts. The CFARM algorithm is then fully described and evaluated using both real and synthetic data sets.


2018 ◽  
Vol 30 (2) ◽  
pp. 526-545
Author(s):  
Xiaowei Zhao ◽  
Zhigang Ma ◽  
Zhi Li ◽  
Zhihui Li

In recent years, multilabel classification has attracted significant attention in multimedia annotation. However, most of the multilabel classification methods focus only on the inherent correlations existing among multiple labels and concepts and ignore the relevance between features and the target concepts. To obtain more robust multilabel classification results, we propose a new multilabel classification method aiming to capture the correlations among multiple concepts by leveraging hypergraph that is proved to be beneficial for relational learning. Moreover, we consider mining feature-concept relevance, which is often overlooked by many multilabel learning algorithms. To better show the feature-concept relevance, we impose a sparsity constraint on the proposed method. We compare the proposed method with several other multilabel classification methods and evaluate the classification performance by mean average precision on several data sets. The experimental results show that the proposed method outperforms the state-of-the-art methods.


2018 ◽  
Vol 44 (3) ◽  
pp. 403-446 ◽  
Author(s):  
Shervin Malmasi ◽  
Mark Dras

Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on several large data sets, evaluated in both intra-corpus and cross-corpus modes.


Sign in / Sign up

Export Citation Format

Share Document