A Fast Boosting Based Incremental Genetic Algorithm for Mining Classification Rules in Large Datasets

Modeling Applications and Theoretical Innovations in Interdisciplinary Evolutionary Computation ◽

10.4018/978-1-4666-3628-6.ch004 ◽

2013 ◽

pp. 46-53

Author(s):

Periasamy Vivekanandan ◽

Raju Nedunchezhian

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Large Datasets ◽

Training Data ◽

Classification Rule ◽

Classification Rules ◽

Slow Process ◽

Ensemble Of Classifiers ◽

Data Set ◽

Mining Community

Genetic algorithm is a search technique purely based on natural evolution process. It is widely used by the data mining community for classification rule discovery in complex domains. During the learning process it makes several passes over the data set for determining the accuracy of the potential rules. Due to this characteristic it becomes an extremely I/O intensive slow process. It is particularly difficult to apply GA when the training data set becomes too large and not fully available. An incremental Genetic algorithm based on boosting phenomenon is proposed in this paper which constructs a weak ensemble of classifiers in a fast incremental manner and thus tries to reduce the learning cost considerably.

Download Full-text

A Fast Boosting Based Incremental Genetic Algorithm for Mining Classification Rules in Large Datasets

International Journal of Applied Evolutionary Computation ◽

10.4018/jaec.2011010104 ◽

2011 ◽

Vol 2 (1) ◽

pp. 49-58

Author(s):

Periasamy Vivekanandan ◽

Raju Nedunchezhian

Keyword(s):

Genetic Algorithm ◽

Training Data ◽

Classification Rule ◽

Rule Discovery ◽

Classification Rules ◽

Search Technique ◽

Natural Evolution ◽

Ensemble Of Classifiers ◽

Data Set ◽

Mining Community

Download Full-text

Ant Miner

International Journal of Artificial Intelligence and Machine Learning ◽

10.4018/ijaiml.2020010104 ◽

2020 ◽

Vol 10 (1) ◽

pp. 45-59

Author(s):

Bijaya Kumar Nanda ◽

Satchidananda Dehuri

Keyword(s):

Data Mining ◽

Large Data ◽

Classification Rule ◽

Classification Rules ◽

Rule Mining ◽

Ant Colonies ◽

Benchmark Datasets ◽

Objective Classification ◽

Single Objective ◽

Better Than

In data mining the task of extracting classification rules from large data is an important task and is gaining considerable attention. This article presents a novel ant miner for classification rule mining. The ant miner is inspired by researches on the behaviour of real ant colonies, simulated annealing, and some data mining concepts as well as principles. This paper presents a Pittsburgh style approach for single objective classification rule mining. The algorithm is tested on a few benchmark datasets drawn from UCI repository. The experimental outcomes confirm that ant miner-HPB (Hybrid Pittsburgh Style Classification) is significantly better than ant-miner-PB (Pittsburgh Style Classification).

Download Full-text

Grid-Based Platform for Mining Association Rules

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.34-35.1961 ◽

2010 ◽

Vol 34-35 ◽

pp. 1961-1965

Author(s):

You Qu Chang ◽

Guo Ping Hou ◽

Huai Yong Deng

Keyword(s):

Data Mining ◽

Large Datasets ◽

Distributed Data Mining ◽

Distributed Data ◽

Data Set ◽

Efficiency And Effectiveness ◽

Geographically Distributed ◽

Commercial Applications ◽

Rule Interestingness ◽

Grid Based

distributed data mining is widely used in industrial and commercial applications to analyze large datasets maintained over geographically distributed sites. This paper discusses the disadvantages of existing distributed data mining systems, and puts forward a distributed data mining platform based grid computing. The experiments done on a data set showed that the proposed approach produces meaningful results and has reasonable efficiency and effectiveness providing a trade-off between runtime and rule interestingness.

Download Full-text

Deep Neural Networks for the Classification of Bank Marketing Data using Data Reduction Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5522.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 4373-4378

Keyword(s):

Neural Network ◽

Data Mining ◽

Data Reduction ◽

Deep Neural Network ◽

Training Data ◽

Neural Network Classifier ◽

Data Set ◽

Reduction Techniques ◽

Marketing Data ◽

Bank Marketing

The amount of data belonging to different domains are being stored rapidly in various repositories across the globe. Extracting useful information from the huge volumes of data is always difficult due to the dynamic nature of data being stored. Data Mining is a knowledge discovery process used to extract the hidden information from the data stored in various repositories, termed as warehouses in the form of patterns. One of the popular tasks of data mining is Classification, which deals with the process of distinguishing every instance of a data set into one of the predefined class labels. Banking system is one of the realworld domains, which collects huge number of client data on a daily basis. In this work, we have collected two variants of the bank marketing data set pertaining to a Portuguese financial institution consisting of 41188 and 45211 instances and performed classification on them using two data reduction techniques. Attribute subset selection has been performed on the first data set and the training data with the selected features are used in classification. Principal Component Analysis has been performed on the second data set and the training data with the extracted features are used in classification. A deep neural network classification algorithm based on Backpropagation has been developed to perform classification on both the data sets. Finally, comparisons are made on the performance of each deep neural network classifier with the four standard classifiers, namely Decision trees, Naïve Bayes, Support vector machines, and k-nearest neighbors. It has been found that the deep neural network classifier outperforms the existing classifiers in terms of accuracy

Download Full-text

Lsquare System for Mining Logic Data

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch132 ◽

2011 ◽

pp. 693-697 ◽

Cited By ~ 1

Author(s):

Giovanni Felici ◽

Klaus Truemper

Keyword(s):

Machine Learning ◽

Data Mining ◽

Pattern Recognition ◽

Medical Diagnosis ◽

Missing Values ◽

The Other ◽

Training Data ◽

Patient Records ◽

Mining Community ◽

The Many

The method described in this chapter is designed for data mining and learning on logic data. This type of data is composed of records that can be described by the presence or absence of a finite number of properties. Formally, such records can be described by variables that may assume only the values true or false, usually referred to as logic (or Boolean) variables. In real applications, it may also happen that the presence or absence of some property cannot be verified for some record; in such a case we consider that variable to be unknown (the capability to treat formally data with missing values is a feature of logic-based methods). For example, to describe patient records in medical diagnosis applications, one may use the logic variables healthy, old, has_high_temperature, among many others. A very common data mining task is to find, based on training data, the rules that separate two subsets of the available records, or explains the belonging of the data to one subset or the other. For example, one may desire to find a rule that, based one the many variables observed in patient records, is able to distinguish healthy patients from sick ones. Such a rule, if sufficiently precise, may then be used to classify new data and/or to gain information from the available data. This task is often referred to as machine learning or pattern recognition and accounts for a significant portion of the research conducted in the data mining community. When the data considered is in logic form or can be transformed into it by some reasonable process, it is of great interest to determine explanatory rules in the form of the combination of logic variables, or logic formulas. In the example above, a rule derived from data could be:if (has_high_temperature is true) and (running_nose is true) then (the patient is not healthy).

Download Full-text

Effective Fuzzy Ontology Based Distributed Document Using Non-Dominated Ranked Genetic Algorithm

Organizational Efficiency through Intelligent Information Technologies ◽

10.4018/978-1-4666-2047-6.ch015 ◽

2012 ◽

pp. 243-264

Author(s):

M. Thangamani ◽

P. Thangaraj

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Genetic Algorithm ◽

Document Clustering ◽

Distributed Environment ◽

Data Set ◽

Clustering Technique ◽

Machine Readable ◽

Readable Format ◽

Machine Readable Format

The increase in the number of documents has aggravated the difficulty of classifying those documents according to specific needs. Clustering analysis in a distributed environment is a thrust area in artificial intelligence and data mining. Its fundamental task is to utilize characters to compute the degree of related corresponding relationship between objects and to accomplish automatic classification without earlier knowledge. Document clustering utilizes clustering technique to gather the documents of high resemblance collectively by computing the documents resemblance. Recent studies have shown that ontologies are useful in improving the performance of document clustering. Ontology is concerned with the conceptualization of a domain into an individual identifiable format and machine-readable format containing entities, attributes, relationships, and axioms. By analyzing types of techniques for document clustering, a better clustering technique depending on Genetic Algorithm (GA) is determined. Non-Dominated Ranked Genetic Algorithm (NRGA) is used in this paper for clustering, which has the capability of providing a better classification result. The experiment is conducted in 20 newsgroups data set for evaluating the proposed technique. The result shows that the proposed approach is very effective in clustering the documents in the distributed environment.

Download Full-text

A Genetic Algorithm Approach for Discovering Tuned Fuzzy Classification Rules with Intra- and Inter-Class Exceptions

Journal of Intelligent Systems ◽

10.1515/jisys-2015-0136 ◽

2016 ◽

Vol 25 (2) ◽

pp. 263-282 ◽

Cited By ~ 3

Author(s):

Renu Bala ◽

Saroj Ratnoo

Keyword(s):

Genetic Algorithm ◽

Fuzzy Rule ◽

Fuzzy Classification ◽

Rule Base ◽

Second Phase ◽

Classification Rules ◽

Linguistic Terms ◽

Data Set ◽

Actual Distribution ◽

Genetic Algorithm Approach

AbstractFuzzy rule-based systems (FRBSs) are proficient in dealing with cognitive uncertainties like vagueness and ambiguity imperative to real-world decision-making situations. Fuzzy classification rules (FCRs) based on fuzzy logic provide a framework for a flexible human-like reasoning involving linguistic variables. Appropriate membership functions (MFs) and suitable number of linguistic terms – according to actual distribution of data – are useful to strengthen the knowledge base (rule base [RB]+ data base [DB]) of FRBSs. An RB is expected to be accurate and interpretable, and a DB must contain appropriate fuzzy constructs (type of MFs, number of linguistic terms, and positioning of parameters of MFs) for the success of any FRBS. Moreover, it would be fascinating to know how a system behaves in some rare/exceptional circumstances and what action ought to be taken in situations where generalized rules cease to work. In this article, we propose a three-phased approach for discovery of FCRs augmented with intra- and inter-class exceptions. A pre-processing algorithm is suggested to tune DB in terms of the MFs and number of linguistic terms for each attribute of a data set in the first phase. The second phase discovers FCRs employing a genetic algorithm approach. Subsequently, intra- and inter-class exceptions are incorporated in the rules in the third phase. The proposed approach is illustrated on an example data set and further validated on six UCI machine learning repository data sets. The results show that the approach has been able to discover more accurate, interpretable, and interesting rules. The rules with intra-class exceptions tell us about the unique objects of a category, and rules with inter-class exceptions enable us to take a right decision in the exceptional circumstances.

Download Full-text

A Genetic Algorithm for Discovering Classification Rules in Data Mining

International Journal of Computer Applications ◽

10.5120/5644-8072 ◽

2012 ◽

Vol 41 (18) ◽

pp. 40-44 ◽

Cited By ~ 9

Author(s):

Basheer M.Al-Maqaleh ◽

Hamid Shahbazkia

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Classification Rules

Download Full-text

Identification of Poison using C4.5 Algorithm

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207247 ◽

2020 ◽

pp. 218-222

Author(s):

Lai Lai Yee ◽

Myo Ma Ma

Keyword(s):

Data Mining ◽

Test Data ◽

Knowledge Worker ◽

Training Data ◽

Independent Data ◽

Classification Rules ◽

Natural Evolution ◽

C4.5 Algorithm ◽

Other Information

Data mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data warehouses or other information repositories. This can be viewed as a result of the natural evolution of information technology. The key point is that data mining is the application of these and other AI and statistical techniques to common business problems in a fashion that makes these techniques available to the skilled knowledge worker as well as the trained statistics professional. This paper is classification system for Toxicology using C4.5. Firstly, the input data are randomly partitioned into two independent data, a training data and a test data. And then two third of the data are allocated to the training data and the remaining one third is allocated to the test data. Final step is C4.5 Algorithm Process, the training data is used to derive C4.5 algorithm. Classification Process, test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable the rules can be applied to the classification of new data.

Download Full-text

CoABCMiner: An Algorithm for Cooperative Rule Classification System Based on Artificial Bee Colony

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500281 ◽

2016 ◽

Vol 25 (01) ◽

pp. 1550028 ◽

Cited By ~ 12

Author(s):

Mete Celik ◽

Fehim Koylu ◽

Dervis Karaboga

Keyword(s):

Artificial Bee Colony ◽

Statistical Tests ◽

Rule Learning ◽

Machine Learning Algorithms ◽

Classification Model ◽

Classification Rule ◽

Data Sets ◽

Classification Rules ◽

Data Set ◽

Bee Colony

In data mining, classification rule learning extracts the knowledge in the representation of IF_THEN rule which is comprehensive and readable. It is a challenging problem due to the complexity of data sets. Various meta-heuristic machine learning algorithms are proposed for rule learning. Cooperative rule learning is the discovery process of all classification rules with a single run concurrently. In this paper, a novel cooperative rule learning algorithm, called CoABCMiner, based on Artificial Bee Colony is introduced. The proposed algorithm handles the training data set and discovers the classification model containing the rule list. Token competition, new updating strategy used in onlooker and employed phases, and new scout bee mechanism are proposed in CoABCMiner to achieve cooperative learning of different rules belonging to different classes. We compared the results of CoABCMiner with several state-of-the-art algorithms using 14 benchmark data sets. Non parametric statistical tests, such as Friedman test, post hoc test, and contrast estimation based on medians are performed. Nonparametric tests determine the similarity of control algorithm among other algorithms on multiple problems. Sensitivity analysis of CoABCMiner is conducted. It is concluded that CoABCMiner can be used to discover classification rules for the data sets used in experiments, efficiently.

Download Full-text