High performance spatial data mining for very large data-sets (citation_only)

This article discusses data mining that draws upon extensive work in areas such as statistics, machine learning, pattern recognition, databases, and high-performance computing to discover interesting and previously unknown information in data. More specifically, data mining is the analysis of 10 large data sets to find relationships and patterns that aren’t readily apparent, and to summarize the data in new and useful ways. Data mining technology has enabled earth scientists from NASA to discover changes in the global carbon cycle and climate system, and biologists to map and explore the human genome. Data mining is not restricted solely to vast banks of data with unlimited ways of analyzing it. Manufacturers, such as W.L. Gore (the maker of GoreTex) use commercially available data mining tools to warehouse and analyze their data, and improve their manufacturing process. Gore uses data mining tools from analytic software vendor SAS for statistical modeling in its manufacturing process.

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

10.1109/icris52159.2020.00170 ◽

2020 ◽

Author(s):

Wenjie Jia ◽

Zhihong He

Keyword(s):

Data Mining ◽

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Annealing Algorithm ◽

Outlier Data ◽

Fast Decomposition

Download Full-text

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Health Informatics - Nursing Informatics ◽

10.1007/978-1-4757-3252-8_10 ◽

2000 ◽

pp. 139-148 ◽

Cited By ~ 2

Author(s):

Patricia A. Abbott

Keyword(s):

Data Mining ◽

Health Care ◽

Knowledge Discovery ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

The Integral of Spatial Data Mining in the Era of Big Data

Advances in Business Information Systems and Analytics - Handbook of Research on Advanced Data Mining Techniques and Applications for Business Intelligence ◽

10.4018/978-1-5225-2031-3.ch006 ◽

2017 ◽

pp. 90-126

Author(s):

Gebeyehu Belay Gebremeskel ◽

Chai Yi ◽

Zhongshi He

Keyword(s):

Data Mining ◽

Data Warehouse ◽

Spatial Data ◽

High Volume ◽

Spatial Data Mining ◽

Research Field ◽

Data Sets ◽

Data Types ◽

Basic Principles ◽

Gis Data

Data Mining (DM) is a rapidly expanding field in many disciplines, and it is greatly inspiring to analyze massive data types, which includes geospatial, image and other forms of data sets. Such the fast growths of data characterized as high volume, velocity, variety, variability, value and others that collected and generated from various sources that are too complex and big to capturing, storing, and analyzing and challenging to traditional tools. The SDM is, therefore, the process of searching and discovering valuable information and knowledge in large volumes of spatial data, which draws basic principles from concepts in databases, machine learning, statistics, pattern recognition and 'soft' computing. Using DM techniques enables a more efficient use of the data warehouse. It is thus becoming an emerging research field in Geosciences because of the increasing amount of data, which lead to new promising applications. The integral SDM in which we focused in this chapter is the inference to geospatial and GIS data.

Download Full-text

Intelligent Data Analysis

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch015 ◽

2011 ◽

pp. 308-314 ◽

Cited By ~ 1

Author(s):

Xiaohui Liu

Keyword(s):

Data Analysis ◽

High Performance ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Intelligent Data Analysis ◽

Statistical Knowledge ◽

Interdisciplinary Study ◽

Performance Computing ◽

Effective Analysis

Intelligent Data Analysis (IDA) is an interdisciplinary study concerned with the effective analysis of data. IDA draws the techniques from diverse fields, including artificial intelligence, databases, high-performance computing, pattern recognition, and statistics. These fields often complement each other (e.g., many statistical methods, particularly those for large data sets, rely on computation, but brute computing power is no substitute for statistical knowledge) (Berthold & Hand 2003; Liu, 1999).

Download Full-text

From Visualisation to Data Mining with Large Data Sets

Proceedings of the 2005 Particle Accelerator Conference ◽

10.1109/pac.2005.1591735 ◽

2006 ◽

Author(s):

A. Adelmann ◽

R.D. Ryne ◽

J.M. Shalf ◽

C. Siegerist

Keyword(s):

Data Mining ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Research of Improved Attribute Reduction Algorithm Based on Data Mining of Rough Set

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2120 ◽

2014 ◽

Vol 644-650 ◽

pp. 2120-2123 ◽

Cited By ~ 2

Author(s):

De Zhi An ◽

Guang Li Wu ◽

Jun Lu

Keyword(s):

Data Mining ◽

Rough Set ◽

Rough Set Theory ◽

Attribute Reduction ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Reduction Algorithm ◽

The Core ◽

Rules Extraction

At present there are many data mining methods. This paper studies the application of rough set method in data mining, mainly on the application of attribute reduction algorithm based on rough set in the data mining rules extraction stage. Rough set in data mining is often used for reduction of knowledge, and thus for the rule extraction. Attribute reduction is one of the core research contents of rough set theory. In this paper, the traditional attribute reduction algorithm based on rough sets is studied and improved, and for large data sets of data mining, a new attribute reduction algorithm is proposed.

Download Full-text

A dynamic K-means clustering for data mining

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i2.pp521-526 ◽

2019 ◽

Vol 13 (2) ◽

pp. 521

Author(s):

Md. Zakir Hossain ◽

Md.Nasim Akhtar ◽

R.B. Ahmad ◽

Mostafijur Rahman

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Large Data ◽

Threshold Value ◽

Specific Pattern ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Number Of Clusters ◽

Data Points

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets. The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>

Download Full-text