scholarly journals An Innovative GA-Based Decision Tree Classifier in Large Scale Data Mining

Author(s):  
Zhiwei Fu
2014 ◽  
Vol 989-994 ◽  
pp. 4594-4597
Author(s):  
Chun Zhi Xing

With the development of Internet, various Internet-based large-scale data are facing increasing competition. With the hope of satisfying the need of data query, it is necessary to use data mining and distributed processing. As a consequence, this paper proposes a large-scale data mining and distributed processing method based on decision tree algorithm.


2021 ◽  
Author(s):  
Arina Prima Silalahi

Data mining is a process that combines statistics, artificial intelligence, mathematics and machine learning to extract data on a large scale in the database. Data mining is always able to analyze the data so as to find the relevance of data that has a meaning and have a tendency to check large-scale data stored in the database to find a meaningful pattern or rules. The increasing availability of data is often not utilized to provide new knowledge so that large data accumulate is meaningless. The purpose of this research is to extract the information so as to produce knowledge through the decision tree and show the accuracy or influence of Iterative Algorithm Dichotomiser 3 which is used to predict a situation. The classes or attributes in the Iterative Algorithm Dichotomiser are continuously broken into relative categories. Fuzzy Curve Shoulder will be used as a function to form the categories of each attribute value. Using a fuzzy shoulder curve, the dataset is processed using a decision tree that is useful for extracting large amounts of data and searching for hidden links between multiple potential input variables with a target variable. The results of this study are decision trees that will provide predictive data with Iterative Dichotomizer (ID) Algorithm 3.


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


2018 ◽  
Vol 3 (1) ◽  
pp. 1-18
Author(s):  
Kislaya Kunjan ◽  
Huanmei Wu ◽  
Tammy R. Toscos ◽  
Bradley N. Doebbeling

2021 ◽  
pp. 1826-1839
Author(s):  
Sandeep Adhikari, Dr. Sunita Chaudhary

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.


Sign in / Sign up

Export Citation Format

Share Document