GA2RM: A GA-Based Action Rule Mining Method

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500127 ◽

2021 ◽

pp. 2150012

Author(s):

Shervin Hashemi ◽

Pirooz Shamsinejad

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Search Space ◽

Data Sets ◽

Mining Method ◽

Classification Rules ◽

Classification Methods ◽

Rule Mining ◽

Current State ◽

Traditional Classification

Action Mining is a subfield of Data Mining that tries to extract actions from traditional data sets. Action Rule is a type of rule that suggests some changes in its consequent part. Extracting action rules from data has been one of the research interests in recent years. Current state-of-the-art action rule mining methods like DEAR typically take classification rules as their input; Since traditional classification methods have been designed for prediction and not for manipulation, therefore extracting action rules directly from data can result in more valuable action rules. Here, we have proposed a method to generate action rules directly from data. To tackle the problem of huge search space of action rules, a Genetic Algorithm has been devised. Different metrics have been defined for investigating the effectiveness of our proposed method and a large number of experiments have been done on real and synthetic data sets. The results show that our method can find from 20% to 10 times more interesting (in case of support and confidence) action rules in comparison with its competitors.

Download Full-text

Present State-of-The-Art of Association Rule Mining Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2202.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6398-6405

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

State Of The Art ◽

Synthetic Data ◽

Data Sets ◽

Evolutionary Analysis ◽

Rule Mining ◽

Transaction Database ◽

Mining Algorithms

A Data mining is the method of extracting useful information from various repositories such as Relational Database, Transaction database, spatial database, Temporal and Time-series database, Data Warehouses, World Wide Web. Various functionalities of Data mining include Characterization and Discrimination, Classification and prediction, Association Rule Mining, Cluster analysis, Evolutionary analysis. Association Rule mining is one of the most important techniques of Data Mining, that aims at extracting interesting relationships within the data. In this paper we study various Association Rule mining algorithms, also compare them by using synthetic data sets, and we provide the results obtained from the experimental analysis

Download Full-text

AI-driven deep CNN approach for multi-label pathology classification using chest X-Rays

PeerJ Computer Science ◽

10.7717/peerj-cs.495 ◽

2021 ◽

Vol 7 ◽

pp. e495

Author(s):

Saleh Albahli ◽

Hafiz Tayyab Rauf ◽

Abdulelah Algosaibi ◽

Valentina Emilia Balas

Keyword(s):

Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Synthetic Data ◽

X Rays ◽

Deep Convolutional Neural Networks ◽

Current State ◽

Pathology Classification ◽

Wide Range ◽

Multi Class Classification

Artificial intelligence (AI) has played a significant role in image analysis and feature extraction, applied to detect and diagnose a wide range of chest-related diseases. Although several researchers have used current state-of-the-art approaches and have produced impressive chest-related clinical outcomes, specific techniques may not contribute many advantages if one type of disease is detected without the rest being identified. Those who tried to identify multiple chest-related diseases were ineffective due to insufficient data and the available data not being balanced. This research provides a significant contribution to the healthcare industry and the research community by proposing a synthetic data augmentation in three deep Convolutional Neural Networks (CNNs) architectures for the detection of 14 chest-related diseases. The employed models are DenseNet121, InceptionResNetV2, and ResNet152V2; after training and validation, an average ROC-AUC score of 0.80 was obtained competitive as compared to the previous models that were trained for multi-class classification to detect anomalies in x-ray images. This research illustrates how the proposed model practices state-of-the-art deep neural networks to classify 14 chest-related diseases with better accuracy.

Download Full-text

Finding Associations in Composite Data Sets

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch008 ◽

2013 ◽

pp. 162-186

Author(s):

M. Sulaiman Khan ◽

Maybin Muyeba ◽

Frans Coenen ◽

David Reid ◽

Hissam Tawfik

Keyword(s):

Synthetic Data ◽

Data Sets ◽

Formal Definition ◽

Rule Mining ◽

Fuzzy Association Rules ◽

Back Ground ◽

Fuzzy Association Rule ◽

Definition Of ◽

Fuzzy Association Rule Mining ◽

Composite Data

In this paper, a composite fuzzy association rule mining mechanism (CFARM), directed at identifying patterns in datasets comprised of composite attributes, is described. Composite attributes are defined as attributes that can take simultaneously two or more values that subscribe to a common schema. The objective is to generate fuzzy association rules using “properties” associated with these composite attributes. The exemplar application is the analysis of the nutrients contained in items found in grocery data sets. The paper commences with a review of the back ground and related work, and a formal definition of the CFARM concepts. The CFARM algorithm is then fully described and evaluated using both real and synthetic data sets.

Download Full-text

Using control charts for on-line video summarisation

MATEC Web of Conferences ◽

10.1051/matecconf/201927701012 ◽

2019 ◽

Vol 277 ◽

pp. 01012 ◽

Cited By ~ 1

Author(s):

Clare E. Matthews ◽

Paria Yousefi ◽

Ludmila I. Kuncheva

Keyword(s):

Feature Extraction ◽

Control Chart ◽

Control Charts ◽

State Of The Art ◽

Synthetic Data ◽

New Method ◽

Data Sets ◽

On Line ◽

Memory Constraints ◽

Video Summarisation

Many existing methods for video summarisation are not suitable for on-line applications, where computational and memory constraints mean that feature extraction and frame selection must be simple and efficient. Our proposed method uses RGB moments to represent frames, and a control-chart procedure to identify shots from which keyframes are then selected. The new method produces summaries of higher quality than two state-of-the-art on-line video summarisation methods identified as the best among nine such methods in our previous study. The summary quality is measured against an objective ideal for synthetic data sets, and compared to user-generated summaries of real videos.

Download Full-text

A Multiclass Classification Method Based on Decoding of Binary Classifiers

Neural Computation ◽

10.1162/neco.2009.03-08-740 ◽

2009 ◽

Vol 21 (7) ◽

pp. 2049-2081 ◽

Cited By ~ 4

Author(s):

Takashi Takenouchi ◽

Shin Ishii

Keyword(s):

Probabilistic Models ◽

Experimental Studies ◽

Synthetic Data ◽

Multiclass Classification ◽

Data Sets ◽

Classification Methods ◽

Boltzmann Machine ◽

Data Set ◽

New Methods ◽

Binary Classifiers

In this letter, we present new methods of multiclass classification that combine multiple binary classifiers. Misclassification of each binary classifier is formulated as a bit inversion error with probabilistic models by making an analogy to the context of information transmission theory. Dependence between binary classifiers is incorporated into our model, which makes a decoder a type of Boltzmann machine. We performed experimental studies using a synthetic data set, data sets from the UCI repository, and bioinformatics data sets, and the results show that the proposed methods are superior to the existing multiclass classification methods.

Download Full-text

Improved N-Best Extraction with an Evaluation on Language Data

Computational Linguistics ◽

10.1162/coli_a_00427 ◽

2021 ◽

pp. 1-35

Author(s):

Johanna Björklund ◽

Frank Drewes ◽

Anna Jonsson

Keyword(s):

Language Processing ◽

State Of The Art ◽

Search Space ◽

Data Sets ◽

Weighted Tree ◽

Original Algorithm ◽

Software Toolkit ◽

Minimal Weight ◽

Language Data ◽

Memory Efficient

Abstract We show that a previously proposed algorithm for the N-best trees problem can be made more efficient by changing how it arranges and explores the search space. Given an integer N and a weighted tree automaton (wta) M over the tropical semiring, the algorithm computes N trees of minimal weight with respect to M. Compared to the original algorithm, the modifications increase the laziness of the evaluation strategy, which makes the new algorithm asymptotically more efficient than its predecessor. The algorithm is implemented in the software Betty, and compared to the state-of-the-art algorithm for extracting the N best runs, implemented in the software toolkit Tiburon. The data sets used in the experiments are wtas resulting from real-world natural language processing tasks, as well as artificially created wtas with varying degrees of nondeterminism. We find that Betty outperforms Tiburon on all tested data sets with respect to running time, while Tiburon seems to be the more memory-efficient choice.

Download Full-text

TripRec

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2015010103 ◽

2015 ◽

Vol 11 (1) ◽

pp. 45-65 ◽

Cited By ~ 1

Author(s):

Heli Sun ◽

Jianbin Huang ◽

Xinwei She ◽

Zhou Yang ◽

Jiao Liu ◽

...

Keyword(s):

Real World ◽

Efficient Algorithm ◽

State Of The Art ◽

Synthetic Data ◽

Time Constraints ◽

Data Sets ◽

Generation Process ◽

Time Requirement ◽

Trip Planning ◽

Frequent Item Sets

The problem of trip planning with time constraints aims to find the optimal routes satisfying the maximum time requirement and possessing the highest attraction score. In this paper, a more efficient algorithm TripRec is proposed to solve this problem. Based on the principle of the Aprior algorithm for mining frequent item sets, our method constructs candidate attraction sets containing k attractions by using the join rule on valid sets consisting of k-1 attractions. After all the valid routes from the valid k-1 attraction sets have been obtained, all of the candidate routes for the candidate k-sets can be acquired through a route extension approach. This method exhibits manifest improvement of the efficiency in the valid routes generation process. Then, by determining whether there exists at least one valid route, the paper prunes some candidate attraction sets to gain all the valid sets. The process will continue until no more valid attraction sets can be obtained. In addition, several optimization strategies are employed to greatly enhance the performance of the algorithm. Experimental results on both real-world and synthetic data sets show that our algorithm has the better pruning rate and efficiency compared with the state-of-the-art method.

Download Full-text

Finding Associations in Composite Data Sets

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2011070101 ◽

2011 ◽

Vol 7 (3) ◽

pp. 1-29 ◽

Cited By ~ 10

Author(s):

M. Sulaiman Khan ◽

Maybin Muyeba ◽

Frans Coenen ◽

David Reid ◽

Hissam Tawfik

Keyword(s):

Association Rule ◽

Synthetic Data ◽

Data Sets ◽

Rule Mining ◽

Fuzzy Association Rules ◽

Back Ground ◽

Fuzzy Association Rule ◽

Definition Of ◽

Fuzzy Association Rule Mining ◽

Composite Data

Download Full-text

Joint Concept Correlation and Feature-Concept Relevance Learning for Multilabel Classification

Neural Computation ◽

10.1162/neco_a_01036 ◽

2018 ◽

Vol 30 (2) ◽

pp. 526-545

Author(s):

Xiaowei Zhao ◽

Zhigang Ma ◽

Zhi Li ◽

Zhihui Li

Keyword(s):

State Of The Art ◽

Relational Learning ◽

Classification Performance ◽

Data Sets ◽

Classification Methods ◽

Average Precision ◽

Multilabel Classification ◽

Multimedia Annotation ◽

Multilabel Learning ◽

Significant Attention

In recent years, multilabel classification has attracted significant attention in multimedia annotation. However, most of the multilabel classification methods focus only on the inherent correlations existing among multiple labels and concepts and ignore the relevance between features and the target concepts. To obtain more robust multilabel classification results, we propose a new multilabel classification method aiming to capture the correlations among multiple concepts by leveraging hypergraph that is proved to be beneficial for relational learning. Moreover, we consider mining feature-concept relevance, which is often overlooked by many multilabel learning algorithms. To better show the feature-concept relevance, we impose a sparsity constraint on the proposed method. We compare the proposed method with several other multilabel classification methods and evaluate the classification performance by mean average precision on several data sets. The experimental results show that the proposed method outperforms the state-of-the-art methods.

Download Full-text

Native Language Identification With Classifier Stacking and Ensembles

Computational Linguistics ◽

10.1162/coli_a_00323 ◽

2018 ◽

Vol 44 (3) ◽

pp. 403-446 ◽

Cited By ~ 7

Author(s):

Shervin Malmasi ◽

Mark Dras

Keyword(s):

State Of The Art ◽

Native Language ◽

Ensemble Methods ◽

Large Data ◽

Language Identification ◽

Large Data Sets ◽

Data Sets ◽

Classification Models ◽

Multiple Classifiers ◽

Current State

Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on several large data sets, evaluated in both intra-corpus and cross-corpus modes.

Download Full-text