A Fast Boosting Based Incremental Genetic Algorithm for Mining Classification Rules in Large Datasets

2011 ◽  
Vol 2 (1) ◽  
pp. 49-58
Author(s):  
Periasamy Vivekanandan ◽  
Raju Nedunchezhian

Genetic algorithm is a search technique purely based on natural evolution process. It is widely used by the data mining community for classification rule discovery in complex domains. During the learning process it makes several passes over the data set for determining the accuracy of the potential rules. Due to this characteristic it becomes an extremely I/O intensive slow process. It is particularly difficult to apply GA when the training data set becomes too large and not fully available. An incremental Genetic algorithm based on boosting phenomenon is proposed in this paper which constructs a weak ensemble of classifiers in a fast incremental manner and thus tries to reduce the learning cost considerably.

Author(s):  
Periasamy Vivekanandan ◽  
Raju Nedunchezhian

Genetic algorithm is a search technique purely based on natural evolution process. It is widely used by the data mining community for classification rule discovery in complex domains. During the learning process it makes several passes over the data set for determining the accuracy of the potential rules. Due to this characteristic it becomes an extremely I/O intensive slow process. It is particularly difficult to apply GA when the training data set becomes too large and not fully available. An incremental Genetic algorithm based on boosting phenomenon is proposed in this paper which constructs a weak ensemble of classifiers in a fast incremental manner and thus tries to reduce the learning cost considerably.


2016 ◽  
Vol 25 (2) ◽  
pp. 263-282 ◽  
Author(s):  
Renu Bala ◽  
Saroj Ratnoo

AbstractFuzzy rule-based systems (FRBSs) are proficient in dealing with cognitive uncertainties like vagueness and ambiguity imperative to real-world decision-making situations. Fuzzy classification rules (FCRs) based on fuzzy logic provide a framework for a flexible human-like reasoning involving linguistic variables. Appropriate membership functions (MFs) and suitable number of linguistic terms – according to actual distribution of data – are useful to strengthen the knowledge base (rule base [RB]+ data base [DB]) of FRBSs. An RB is expected to be accurate and interpretable, and a DB must contain appropriate fuzzy constructs (type of MFs, number of linguistic terms, and positioning of parameters of MFs) for the success of any FRBS. Moreover, it would be fascinating to know how a system behaves in some rare/exceptional circumstances and what action ought to be taken in situations where generalized rules cease to work. In this article, we propose a three-phased approach for discovery of FCRs augmented with intra- and inter-class exceptions. A pre-processing algorithm is suggested to tune DB in terms of the MFs and number of linguistic terms for each attribute of a data set in the first phase. The second phase discovers FCRs employing a genetic algorithm approach. Subsequently, intra- and inter-class exceptions are incorporated in the rules in the third phase. The proposed approach is illustrated on an example data set and further validated on six UCI machine learning repository data sets. The results show that the approach has been able to discover more accurate, interpretable, and interesting rules. The rules with intra-class exceptions tell us about the unique objects of a category, and rules with inter-class exceptions enable us to take a right decision in the exceptional circumstances.


Author(s):  
Lai Lai Yee ◽  
Myo Ma Ma

Data mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data warehouses or other information repositories. This can be viewed as a result of the natural evolution of information technology. The key point is that data mining is the application of these and other AI and statistical techniques to common business problems in a fashion that makes these techniques available to the skilled knowledge worker as well as the trained statistics professional. This paper is classification system for Toxicology using C4.5. Firstly, the input data are randomly partitioned into two independent data, a training data and a test data. And then two third of the data are allocated to the training data and the remaining one third is allocated to the test data. Final step is C4.5 Algorithm Process, the training data is used to derive C4.5 algorithm. Classification Process, test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable the rules can be applied to the classification of new data.


2016 ◽  
Vol 25 (01) ◽  
pp. 1550028 ◽  
Author(s):  
Mete Celik ◽  
Fehim Koylu ◽  
Dervis Karaboga

In data mining, classification rule learning extracts the knowledge in the representation of IF_THEN rule which is comprehensive and readable. It is a challenging problem due to the complexity of data sets. Various meta-heuristic machine learning algorithms are proposed for rule learning. Cooperative rule learning is the discovery process of all classification rules with a single run concurrently. In this paper, a novel cooperative rule learning algorithm, called CoABCMiner, based on Artificial Bee Colony is introduced. The proposed algorithm handles the training data set and discovers the classification model containing the rule list. Token competition, new updating strategy used in onlooker and employed phases, and new scout bee mechanism are proposed in CoABCMiner to achieve cooperative learning of different rules belonging to different classes. We compared the results of CoABCMiner with several state-of-the-art algorithms using 14 benchmark data sets. Non parametric statistical tests, such as Friedman test, post hoc test, and contrast estimation based on medians are performed. Nonparametric tests determine the similarity of control algorithm among other algorithms on multiple problems. Sensitivity analysis of CoABCMiner is conducted. It is concluded that CoABCMiner can be used to discover classification rules for the data sets used in experiments, efficiently.


Author(s):  
H. Sheikhian ◽  
M. R. Delavar ◽  
A. Stein

Uncertainty is one of the main concerns in geospatial data analysis. It affects different parts of decision making based on such data. In this paper, a new methodology to handle uncertainty for multi-criteria decision making problems is proposed. It integrates hierarchical rough granulation and rule extraction to build an accurate classifier. Rough granulation provides information granules with a detailed quality assessment. The granules are the basis for the rule extraction in granular computing, which applies quality measures on the rules to obtain the best set of classification rules. The proposed methodology is applied to assess seismic physical vulnerability in Tehran. Six effective criteria reflecting building age, height and material, topographic slope and earthquake intensity of the North Tehran fault have been tested. The criteria were discretized and the data set was granulated using a hierarchical rough method, where the best describing granules are determined according to the quality measures. The granules are fed into the granular computing algorithm resulting in classification rules that provide the highest prediction quality. This detailed uncertainty management resulted in 84% accuracy in prediction in a training data set. It was applied next to the whole study area to obtain the seismic vulnerability map of Tehran. A sensitivity analysis proved that earthquake intensity is the most effective criterion in the seismic vulnerability assessment of Tehran.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Haitham Elwahsh ◽  
Mona Gamal ◽  
A. A. Salama ◽  
I. M. El-Henawy

Recently designing an effective intrusion detection systems (IDS) within Mobile Ad Hoc Networks Security (MANETs) becomes a requirement because of the amount of indeterminacy and doubt exist in that environment. Neutrosophic system is a discipline that makes a mathematical formulation for the indeterminacy found in such complex situations. Neutrosophic rules compute with symbols instead of numeric values making a good base for symbolic reasoning. These symbols should be carefully designed as they form the propositions base for the neutrosophic rules (NR) in the IDS. Each attack is determined by membership, nonmembership, and indeterminacy degrees in neutrosophic system. This research proposes a MANETs attack inference by a hybrid framework of Self-Organized Features Maps (SOFM) and the genetic algorithms (GA). The hybrid utilizes the unsupervised learning capabilities of the SOFM to define the MANETs neutrosophic conditional variables. The neutrosophic variables along with the training data set are fed into the genetic algorithm to find the most fit neutrosophic rule set from a number of initial subattacks according to the fitness function. This method is designed to detect unknown attacks in MANETs. The simulation and experimental results are conducted on the KDD-99 network attacks data available in the UCI machine-learning repository for further processing in knowledge discovery. The experiments cleared the feasibility of the proposed hybrid by an average accuracy of 99.3608 % which is more accurate than other IDS found in literature.


Author(s):  
Sufal Das ◽  
Hemanta Kumar Kalita

Breast cancer is the second largest cause of cancer deaths among women. Mainly, this disease is tumor related cause of death in women. Early detection of breast cancer may protect women from death. Various computational methods have been utilized to enhance the diagnoses procedures. In this paper, we have presented the genetic algorithm (GA) based association rule mining method which can be applied to detect breast cancer efficiently. In this work, we have represented each solution as chromosome and applied to genetic algorithm based rule mining. Association rules which imply classification rules are encoded with binary strings to represent chromosomes. Finally, optimal solutions are found out by develop GA-based approach utilizing a feedback linkage between feature selection and association rule.


Sign in / Sign up

Export Citation Format

Share Document