An Innovative GA-Based Decision Tree Classifier in Large Scale Data Mining

With the development of Internet, various Internet-based large-scale data are facing increasing competition. With the hope of satisfying the need of data query, it is necessary to use data mining and distributed processing. As a consequence, this paper proposes a large-scale data mining and distributed processing method based on decision tree algorithm.

Download Full-text

Analysis of Iterative Dichotomiser 3 Algorithm Uses Fuzzy Curves Shoulder as a Determinant of Grade Value

10.31219/osf.io/5tme4 ◽

2021 ◽

Author(s):

Arina Prima Silalahi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Iterative Algorithm ◽

Large Scale ◽

Large Data ◽

Target Variable ◽

Large Scale Data ◽

New Knowledge ◽

Input Variables ◽

Scale Data

Data mining is a process that combines statistics, artificial intelligence, mathematics and machine learning to extract data on a large scale in the database. Data mining is always able to analyze the data so as to find the relevance of data that has a meaning and have a tendency to check large-scale data stored in the database to find a meaningful pattern or rules. The increasing availability of data is often not utilized to provide new knowledge so that large data accumulate is meaningless. The purpose of this research is to extract the information so as to produce knowledge through the decision tree and show the accuracy or influence of Iterative Algorithm Dichotomiser 3 which is used to predict a situation. The classes or attributes in the Iterative Algorithm Dichotomiser are continuously broken into relative categories. Fuzzy Curve Shoulder will be used as a function to form the categories of each attribute value. Using a fuzzy shoulder curve, the dataset is processed using a decision tree that is useful for extracting large amounts of data and searching for hidden links between multiple potential input variables with a target variable. The results of this study are decision trees that will provide predictive data with Iterative Dichotomizer (ID) Algorithm 3.

Download Full-text

Multi-GPU approach to global induction of classification trees for large-scale data mining

Applied Intelligence ◽

10.1007/s10489-020-01952-5 ◽

2021 ◽

Author(s):

Krzysztof Jurczuk ◽

Marcin Czajkowski ◽

Marek Kretowski

Keyword(s):

Data Mining ◽

Large Scale ◽

Real Life ◽

Population Based ◽

Tree Structure ◽

Global Approach ◽

Data Parallel ◽

Large Scale Data ◽

The Impact ◽

Scale Data

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

Classification and metaclassification in large scale data mining application for estimation of software projects

2010 IEEE 9th International Conference on Cyberntic Intelligent Systems ◽

10.1109/ukricis.2010.5898136 ◽

2010 ◽

Cited By ~ 1

Author(s):

Dorota Dzega ◽

Wieslaw Pietruszkiewicz

Keyword(s):

Data Mining ◽

Large Scale ◽

Software Projects ◽

Large Scale Data ◽

Data Mining Application ◽

Scale Data

Download Full-text

Large-Scale Data Mining to Optimize Patient-Centered Scheduling at Health Centers

Journal of Healthcare Informatics Research ◽

10.1007/s41666-018-0030-0 ◽

2018 ◽

Vol 3 (1) ◽

pp. 1-18

Author(s):

Kislaya Kunjan ◽

Huanmei Wu ◽

Tammy R. Toscos ◽

Bradley N. Doebbeling

Keyword(s):

Data Mining ◽

Large Scale ◽

Health Centers ◽

Patient Centered ◽

Large Scale Data ◽

Scale Data

Download Full-text

Large scale data mining approach for gene-specific standardization of microarray gene expression data

Bioinformatics ◽

10.1093/bioinformatics/btl500 ◽

2006 ◽

Vol 22 (23) ◽

pp. 2898-2904 ◽

Cited By ~ 12

Author(s):

S. Yoon ◽

Y. Yang ◽

J. Choi ◽

J. Seong

Keyword(s):

Gene Expression ◽

Data Mining ◽

Large Scale ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Large Scale Data ◽

Data Mining Approach ◽

Microarray Gene ◽

Scale Data

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text