Data mining fool’s gold

The scientific method is based on the rigorous testing of falsifiable conjectures. Data mining, in contrast, puts data before theory by searching for statistical patterns without being constrained by prespecified hypotheses. Artificial intelligence and machine learning systems, for example, often rely on data-mining algorithms to construct models with little or no human guidance. However, a plethora of patterns are inevitable in large data sets, and computer algorithms have no effective way of assessing whether the patterns they unearth are truly useful or meaningless coincidences. While data mining sometimes discovers useful relationships, the data deluge has caused the number of possible patterns that can be discovered relative to the number that are genuinely useful to grow exponentially—which makes it increasingly likely that what data mining unearths is likely to be fool’s gold.

Download Full-text

A comparative study of Distributed Large Scale Data Mining Algorithms

BSSS Journal of Computer ◽

10.51767/jc1102 ◽

2020 ◽

Author(s):

Isha Sood ◽

Varsha Sharma

Keyword(s):

Data Mining ◽

Large Scale ◽

Large Data ◽

Data Sets ◽

Data Set ◽

Data Mining Algorithms ◽

Large Scale Data ◽

Mapreduce Model ◽

Mining Algorithms ◽

Scale Data

Essentially, data mining concerns the computation of data and the identification of patterns and trends in the information so that we might decide or judge. Data mining concepts have been in use for years, but with the emergence of big data, they are even more common. In particular, the scalable mining of such large data sets is a difficult issue that has attached several recent findings. A few of these recent works use the MapReduce methodology to construct data mining models across the data set. In this article, we examine current approaches to large-scale data mining and compare their output to the MapReduce model. Based on our research, a system for data mining that combines MapReduce and sampling is implemented and addressed

Download Full-text

Data Mining Algorithms: An Overview

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v15i6.1615 ◽

2016 ◽

Vol 15 (6) ◽

pp. 6806-6813 ◽

Cited By ~ 2

Author(s):

Sethunya R Joseph ◽

Hlomani Hlomani ◽

Keletso Letsholo

Keyword(s):

Data Mining ◽

Large Scale ◽

Predictive Analytics ◽

Large Data ◽

Knowledge Discovery In Databases ◽

Data Sets ◽

Data Mining Algorithms ◽

Data Extract ◽

Mining Algorithms ◽

Operational Processes

The research on data mining has successfully yielded numerous tools, algorithms, methods and approaches for handling large amounts of data for various purposeful use andÂ Â problem solving. Data mining has become an integral part of many application domains such as data ware housing, predictive analytics, business intelligence, bio-informatics and decision support systems. Prime objective of data mining is to effectively handle large scale data, extract actionable patterns, and gain insightful knowledge. Data mining is part and parcel of knowledge discovery in databases (KDD) process. Success and improved decision making normally depends on how quickly one can discover insights from data. These insights could be used to drive better actions which can be used in operational processes and even predict future behaviour. This paper presents an overview of various algorithms necessary for handling large data sets. These algorithms define various structures and methods implemented to handle big data. The review also discusses the general strengths and limitations of these algorithms. This paper can quickly guide or an eye opener to the data mining researchers on which algorithm(s) to select and apply in solving the problems they will be investigating.

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

10.1109/icris52159.2020.00170 ◽

2020 ◽

Author(s):

Wenjie Jia ◽

Zhihong He

Keyword(s):

Data Mining ◽

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Annealing Algorithm ◽

Outlier Data ◽

Fast Decomposition

Download Full-text

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Health Informatics - Nursing Informatics ◽

10.1007/978-1-4757-3252-8_10 ◽

2000 ◽

pp. 139-148 ◽

Cited By ~ 2

Author(s):

Patricia A. Abbott

Keyword(s):

Data Mining ◽

Health Care ◽

Knowledge Discovery ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

Cloud for Distributed Data Analysis Based on the Actor Model

Scientific Programming ◽

10.1155/2016/1050293 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Ivan Kholod ◽

Ilya Petukhov ◽

Andrey Shorov

Keyword(s):

Data Mining ◽

Data Analysis ◽

Data Sets ◽

Distributed Data ◽

Actor Model ◽

Confidential Information ◽

Data Mining Algorithms ◽

Functional Blocks ◽

Mining Algorithms ◽

Distributed Data Analysis

This paper describes the construction of a Cloud for Distributed Data Analysis (CDDA) based on the actor model. The design uses an approach to map the data mining algorithms on decomposed functional blocks, which are assigned to actors. Using actors allows users to move the computation closely towards the stored data. The process does not require loading data sets into the cloud and allows users to analyze confidential information locally. The results of experiments show that the efficiency of the proposed approach outperforms established solutions.

Download Full-text

Introduction to Fuzzy Data Mining Methods

Handbook of Research on Fuzzy Information Processing in Databases ◽

10.4018/978-1-59904-853-6.ch003 ◽

2011 ◽

pp. 55-95 ◽

Cited By ~ 12

Author(s):

Balazs Feil ◽

Janos Abonyi

Keyword(s):

Data Mining ◽

Model Performance ◽

Fuzzy Rule ◽

Classification Problem ◽

Data Sets ◽

Rule Based ◽

Data Mining Algorithms ◽

Rule Bases ◽

Mining Algorithms ◽

Extraction Model

This chapter aims to give a comprehensive view about the links between fuzzy logic and data mining. It will be shown that knowledge extracted from simple data sets or huge databases can be represented by fuzzy rule-based expert systems. It is highlighted that both model performance and interpretability of the mined fuzzy models are of major importance, and effort is required to keep the resulting rule bases small and comprehensible. Therefore, in the previous years, soft computing based data mining algorithms have been developed for feature selection, feature extraction, model optimization, and model reduction (rule based simplification). Application of these techniques is illustrated using the wine data classification problem. The results illustrate that fuzzy tools can be applied in a synergistic manner through the nine steps of knowledge discovery.

Download Full-text

High performance spatial data mining for very large data-sets (citation_only)

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03 ◽

10.1145/781498.781509 ◽

2003 ◽

Author(s):

Baris Kazar

Keyword(s):

Data Mining ◽

Spatial Data ◽

High Performance ◽

Large Data ◽

Spatial Data Mining ◽

Large Data Sets ◽

Data Sets

Download Full-text

From Visualisation to Data Mining with Large Data Sets

Proceedings of the 2005 Particle Accelerator Conference ◽

10.1109/pac.2005.1591735 ◽

2006 ◽

Author(s):

A. Adelmann ◽

R.D. Ryne ◽

J.M. Shalf ◽

C. Siegerist

Keyword(s):

Data Mining ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Research of Improved Attribute Reduction Algorithm Based on Data Mining of Rough Set

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2120 ◽

2014 ◽

Vol 644-650 ◽

pp. 2120-2123 ◽

Cited By ~ 2

Author(s):

De Zhi An ◽

Guang Li Wu ◽

Jun Lu

Keyword(s):

Data Mining ◽

Rough Set ◽

Rough Set Theory ◽

Attribute Reduction ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Reduction Algorithm ◽

The Core ◽

Rules Extraction

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.

Download Full-text