Variations on Associative Classifiers and Classification Results Analyses

The chapter introduces the associative classifier, a classification model based on association rules, and describes the three phases of the model building process: rule generation, pruning, and selection. In the first part of the chapter, these phases are described in detail, and several variations on the associative classifier model are presented within the context of the relevant phase. These variations are: mining data sets with re-occurring items, using negative association rules, and pruning rules using graph-based techniques. Each of these departs from the standard model in a crucial way, and thus expands the classification potential. The second part of the chapter describes a system, ARC-UI that allows a user to analyze the results of classifying an item using an associative classifier. This system uses an intuitive, Web-based interface and, with this system, the user is able to see the rules that were used to classify an item, modify either the item being classified or the rule set that was used, view the relationship between attributes, rules and classes in the rule set, and analyze the training data set with respect to the item being classified.

Download Full-text

SaltSeg: Automatic 3D salt segmentation using a deep convolutional neural network

Interpretation ◽

10.1190/int-2018-0235.1 ◽

2019 ◽

Vol 7 (3) ◽

pp. SE113-SE122 ◽

Cited By ~ 26

Author(s):

Yunzhi Shi ◽

Xinming Wu ◽

Sergey Fomel

Keyword(s):

Large Scale ◽

Model Building ◽

Ground Truth ◽

Velocity Model ◽

Training Data ◽

Data Sets ◽

Validation Data ◽

Data Set ◽

Seismic Image ◽

Data Generator

Salt boundary interpretation is important for the understanding of salt tectonics and velocity model building for seismic migration. Conventional methods consist of computing salt attributes and extracting salt boundaries. We have formulated the problem as 3D image segmentation and evaluated an efficient approach based on deep convolutional neural networks (CNNs) with an encoder-decoder architecture. To train the model, we design a data generator that extracts randomly positioned subvolumes from large-scale 3D training data set followed by data augmentation, then feed a large number of subvolumes into the network while using salt/nonsalt binary labels generated by thresholding the velocity model as ground truth labels. We test the model on validation data sets and compare the blind test predictions with the ground truth. Our results indicate that our method is capable of automatically capturing subtle salt features from the 3D seismic image with less or no need for manual input. We further test the model on a field example to indicate the generalization of this deep CNN method across different data sets.

Download Full-text

Distance based clustering of class association rules to build a compact, accurate and descriptive classifier

Computer Science and Information Systems ◽

10.2298/csis200430037m ◽

2020 ◽

pp. 37-37

Author(s):

Jamolbek Mattiev ◽

Branko Kavsek

Keyword(s):

Association Rules ◽

State Of The Art ◽

Learning Algorithms ◽

Rule Learning ◽

Classification Model ◽

Cluster Center ◽

Classical Class ◽

Selection Step ◽

Class Association Rule ◽

Associative Classifiers

Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learning algorithms, the number of rule discovered on those ?big? datasets can easily exceed thousands. To produce compact, understandable and accurate classifiers, such rules have to be grouped and pruned, so that only a reasonable number of them are presented to the end user for inspection and further analysis. In this paper, we propose new methods that are able to reduce the number of class association rules produced by ?classical? class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose new associative classifiers, called DC, DDC and CDC, that use distance-based agglomerative hierarchical clustering as a post-processing step to reduce the number of its rules, and in the rule-selection step, we use different strategies (based on database coverage and cluster center) for each algorithm. Experimental results performed on selected datasets from the UCI ML repository show that our classifiers are able to learn classifiers containing significantly fewer rules than state-of-the-art rule learning algorithms on datasets with a larger number of examples. On the other hand, the classification accuracy of the proposed classifiers is not significantly different from state-of-the-art rule-learners on most of the datasets.

Download Full-text

Data Cleaning for Classification Using Misclassification Analysis

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0297 ◽

2010 ◽

Vol 14 (3) ◽

pp. 297-302 ◽

Cited By ~ 32

Author(s):

Piyasak Jeatrakul ◽

◽

Kok Wai Wong ◽

Chun Che Fung

Keyword(s):

Data Cleaning ◽

Binary Classification ◽

Good Alternative ◽

Training Data ◽

Classification Model ◽

Data Sets ◽

Pima Indians ◽

Classification Problems ◽

Data Set ◽

Preprocessing Technique

In most classification problems, sometimes in order to achieve better results, data cleaning is used as a preprocessing technique. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data. This should enable the use of a better and representative data set to develop a reliable classification model. In most classification models, unclean data could sometime affect the classification accuracies of a model. In this paper, we investigate the use of misclassification analysis for data cleaning. In order to demonstrate our concept, we have used Artificial Neural Network (ANN) as the core computational intelligence technique. We use four benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository to investigate the results from our proposed data cleaning technique. The experimental data sets used in our experiment are binary classification problems, which are German credit data, BUPA liver disorders, Johns Hopkins Ionosphere and Pima Indians Diabetes. The results show that the proposed cleaning technique could be a good alternative to provide some confidence when constructing a classification model.

Download Full-text

Three New MDL-Based Pruning Techniques for Robust Rule Induction

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1243/09544062c18404 ◽

2006 ◽

Vol 220 (4) ◽

pp. 553-564 ◽

Cited By ~ 2

Author(s):

D T Pham ◽

A A Afify

Keyword(s):

Minimum Description Length ◽

Large Data ◽

Rule Induction ◽

Training Data ◽

Data Sets ◽

New Techniques ◽

Mdl Principle ◽

Rule Sets ◽

Rule Set ◽

Pruning Techniques

Overfitting the training data is a major problem in machine learning, particularly when noise is present. Overfitting increases learning time and reduces both the accuracy and the comprehensibility of the generated rules, making learning from large data sets more difficult. Pruning is a technique widely used for addressing such problems and consequently forms an essential component of practical learning algorithms. An important class of pruning techniques is that based on the minimum description length (MDL) principle. This paper presents three new techniques using the MDL principle for pruning rule sets. An important advantage of these techniques is that all of the training data can be used for both inducing and evaluating rule sets. The performance of the techniques are evaluated using three criteria: classification accuracy, rule set complexity, and execution time. This shows that the new techniques, when incorporated into a rule induction algorithm, are more efficient and lead to accurate rule sets that are significantly smaller in size compared with the case before pruning.

Download Full-text

Seismic stratigraphy interpretation by deep convolutional neural networks: A semisupervised workflow

Geophysics ◽

10.1190/geo2019-0433.1 ◽

2020 ◽

Vol 85 (4) ◽

pp. WA77-WA86 ◽

Cited By ~ 3

Author(s):

Haibin Di ◽

Zhun Li ◽

Hiren Maniar ◽

Aria Abubakar

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Seismic Data ◽

Network Architecture ◽

Domain Knowledge ◽

Model Building ◽

Seismic Stratigraphy ◽

Training Data ◽

Data Sets ◽

Deep Convolutional Neural Networks

Depicting geologic sequences from 3D seismic surveying is of significant value to subsurface reservoir exploration, but it is usually time- and labor-intensive for manual interpretation by experienced seismic interpreters. We have developed a semisupervised workflow for efficient seismic stratigraphy interpretation by using the state-of-the-art deep convolutional neural networks (CNNs). Specifically, the workflow consists of two components: (1) seismic feature self-learning (SFSL) and (2) stratigraphy model building (SMB), each of which is formulated as a deep CNN. Whereas the SMB is supervised by knowledge from domain experts and the associated CNN uses a similar network architecture typically used in image segmentation, the SFSL is designed as an unsupervised process and thus can be performed backstage while an expert prepares the training labels for the SMB CNN. Compared with conventional approaches, the our workflow is superior in two aspects. First, the SMB CNN, initialized by the SFSL CNN, successfully inherits the prior knowledge of the seismic features in the target seismic data. Therefore, it becomes feasible for completing the supervised training of the SMB CNN more efficiently using only a small amount of training data, for example, less than 0.1% of the available seismic data as demonstrated in this paper. Second, for the convenience of seismic experts in translating their domain knowledge into training labels, our workflow is designed to be applicable to three scenarios, trace-wise, paintbrushing, and full-sectional annotation. The performance of the new workflow is well-verified through application to three real seismic data sets. We conclude that the new workflow is not only capable of providing robust stratigraphy interpretation for a given seismic volume, but it also holds great potential for other problems in seismic data analysis.

Download Full-text

Work Site Trip Reduction Model and Manual

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198105192400125 ◽

2005 ◽

Vol 1924 (1) ◽

pp. 197-206

Author(s):

Philip L. Winters ◽

Rafael A. Perez ◽

Ajay D. Joshi ◽

Jennifer Perone

Keyword(s):

Los Angeles ◽

Urban Areas ◽

Model Building ◽

The United States ◽

Training Data ◽

Data Sets ◽

Statistical Regression ◽

Reduction Model ◽

Work Site ◽

Unseen Data

Today's transportation professionals often use the ITE Trip Generation Manual and the Parking Generation Manual for estimating future traffic volumes to base off-site transportation improvements and identify parking requirements. But these manuals are inadequate for assessing the claims made by specific transportation demand management (TDM) programs in reducing vehicle trips by a certain amount at particular work sites. This paper presents a work site trip reduction model (WTRM) that can help transportation professionals in assessing those claims. WTRM was built on data from three urban areas in the United States: Los Angeles, California; Tucson, Arizona; and nine counties in Washington State. The data consist of work sites’ employee modal characteristics aggregated at the employer level and a listing of incentives and amenities offered by employers. The dependent variable chosen was the change in vehicle trip rate that correlated with the goals of TDM programs. Two different approaches were used in the model-building process: linear statistical regression and nonlinear neural networks. For performance evaluation the data sets were divided into two disjoint sets: a training set, which was used to build the models, and a validation set, which was used as unseen data to evaluate the models. Because the number of data samples varied from the three areas, two training data sets were formed: one consisted of all training data samples from three areas and the other contained equally sampled training data from the three areas. The best model was the neural net model built on equally sampled training data.

Download Full-text

Building Weighted Associative Classifiers using Maximum Likelihood Estimation to Improve Prediction Accuracy in Health Care Data Mining

Journal of Information & Knowledge Management ◽

10.1142/s0219649213500081 ◽

2013 ◽

Vol 12 (01) ◽

pp. 1350008 ◽

Cited By ~ 3

Author(s):

Sunita Soni ◽

O. P. Vyas

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Association Rule ◽

Likelihood Estimation ◽

Classification Systems ◽

Training Data ◽

Classification Model ◽

Data Set ◽

Promising Alternative ◽

Associative Classifiers

Associative classifiers are new classification approach that use association rules for classification. An important advantage of these classification systems is that, using association rule mining (ARM) they are able to examine several features at a time. Many applications can benefit from good classification model. Associative classifiers are especially fit to applications where the model may assist the domain experts in their decisions. Medical diagnosis is a domain where the maximum accuracy of the model is desired. In this paper, we propose a framework weighted associative classifier (WAC) that assigns different weights to different attributes according to their predicting capability. We are using maximum likelihood estimation (MLE) theory to calculate weight of each attribute using training data. We also show how existing Apriori algorithm can be modified in weighted environment to infer association rule from medical dataset having numeric valued attributes as the conventional ARM usually deals with the transaction database with categorical values. Experiments have been performed on benchmark data set to evaluate the performance of WAC in terms of accuracy, number of rules generating and impact of minimum support threshold on WAC outcomes. The result reveals that WAC is a promising alternative in medical prediction and certainly deserves further attention.

Download Full-text

Acoustilytix™: A Web-Based Automated Ultrasonic Vocalization Scoring Platform

Brain Sciences ◽

10.3390/brainsci11070864 ◽

2021 ◽

Vol 11 (7) ◽

pp. 864

Author(s):

Catherine B. Ashley ◽

Ryan D. Snyder ◽

James E. Shepherd ◽

Catalina Cervantes ◽

Nitish Mittal ◽

...

Keyword(s):

Machine Learning ◽

Emotional Processing ◽

Corrective Feedback ◽

Training Data ◽

Classification Model ◽

Kappa Statistics ◽

Web Based ◽

Call Type ◽

Barrier To Entry ◽

Learning Principles

Ultrasonic vocalizations (USVs) are known to reflect emotional processing, brain neurochemistry, and brain function. Collecting and processing USV data is manual, time-intensive, and costly, creating a significant bottleneck by limiting researchers’ ability to employ fully effective and nuanced experimental designs and serving as a barrier to entry for other researchers. In this report, we provide a snapshot of the current development and testing of Acoustilytix™, a web-based automated USV scoring tool. Acoustilytix implements machine learning methodology in the USV detection and classification process and is recording-environment-agnostic. We summarize the user features identified as desirable by USV researchers and how these were implemented. These include the ability to easily upload USV files, output a list of detected USVs with associated parameters in csv format, and the ability to manually verify or modify an automatically detected call. With no user intervention or tuning, Acoustilytix achieves 93% sensitivity (a measure of how accurately Acoustilytix detects true calls) and 73% precision (a measure of how accurately Acoustilytix avoids false positives) in call detection across four unique recording environments and was superior to the popular DeepSqueak algorithm (sensitivity = 88%; precision = 41%). Future work will include integration and implementation of machine-learning-based call type classification prediction that will recommend a call type to the user for each detected call. Call classification accuracy is currently in the 71–79% accuracy range, which will continue to improve as more USV files are scored by expert scorers, providing more training data for the classification model. We also describe a recently developed feature of Acoustilytix that offers a fast and effective way to train hand-scorers using automated learning principles without the need for an expert hand-scorer to be present and is built upon a foundation of learning science. The key is that trainees are given practice classifying hundreds of calls with immediate corrective feedback based on an expert’s USV classification. We showed that this approach is highly effective with inter-rater reliability (i.e., kappa statistics) between trainees and the expert ranging from 0.30–0.75 (average = 0.55) after only 1000–2000 calls of training. We conclude with a brief discussion of future improvements to the Acoustilytix platform.

Download Full-text

Generating a Condensed Representation for Positive and Negative Association Rules

Business Information Systems ◽

10.52825/bis.v1i.40 ◽

2021 ◽

pp. 175-186

Author(s):

Bemarisika Parfait ◽

André Totohasina

Keyword(s):

Association Rules ◽

Association Rule ◽

Negative Association ◽

Condensed Representation ◽

Rule Set ◽

Closed Itemsets ◽

Negative Association Rules ◽

Maximal Frequent Itemsets ◽

Highly Correlated ◽

Common Association

Given a large collection of transactions containing items, a basic common association rules problem is the huge size of the extracted rule set. Pruning uninteresting and redundant association rules is a promising approach to solve this problem. In this paper, we propose a Condensed Representation for Positive and Negative Association Rules representing non-redundant rules for both exact and approximate association rules based on the sets of frequent generator itemsets, frequent closed itemsets, maximal frequent itemsets, and minimal infrequent itemsets in database B. Experiments on dense (highly-correlated) databases show a significant reduction of the size of extracted association rule set in database B.

Download Full-text

A Parallel Algorithm of Association Rules Applicable for Sales Data Analysis

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200304144112 ◽

2020 ◽

Vol 13 ◽

Author(s):

Guoping Lei ◽

Ke Xiao ◽

Xiuying Luo ◽

Feiyi Cui ◽

Minlu Dai

Keyword(s):

Data Analysis ◽

Parallel Algorithm ◽

Association Rules ◽

Management System ◽

Negative Association ◽

Data Sets ◽

Local Data ◽

Sales Data ◽

Negative Association Rules ◽

Improved Algorithm

Background: This paper puts forward a parallel algorithm of association rules applicable for sales data analysis based on association rules by utilizing the idea of division and designs a sales management system for mall including behavior recognition and data analysis function as the application model of this algorithm with clothing store data management system as study object. Objective: To adapt to the data particularity of the study object, while mining the association rules, the improved algorithm also considers the priority relations, weight, negative association rules, and other factors among different items of the database. Method: His improved algorithm is applied to Apriori algorithm, dividing the original database into n local data sets, mining the local data sets parallelly, finding out the local frequent data sets in each local data set, and finally counting the support and determine the final overall frequent sets. Result: Experiment verifies that this algorithm reduces the visit times of the database, shortens the mining time of algorithm, and improves the effectiveness and adaptability of the mining result. Conclusion: With the application with negative association rules added, data with diversified results can be mined during analyzing specific problems, mining efficiency is improved, the accuracy and adaptability of mining result is guaranteed, and the high efficiency of algorithm is also ensured. The improvement of increment mining efficiency of database will be considered next while the database is updated continuously.

Download Full-text