Data mining from extreme data sets: very large and/or very skewed data sets

This paper is the outcome of an attempt in mining recorded power system operational data in order to get new insight to practical power system behavior. Data mining, in general, is essentially finding new relations between data sets by analyzing well known or recorded data. In this effort we make use of the recorded data of the Southern regional grid of India. Some interesting relations at the total system level between frequency, total MW/MVAr generation, and average system voltage have been obtained. The aim of this work is to highlight the potential of data mining for power system applications and also some of the concerns that need to be addressed to make such efforts more useful.

Download Full-text

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

10.1109/icris52159.2020.00170 ◽

2020 ◽

Author(s):

Wenjie Jia ◽

Zhihong He

Keyword(s):

Data Mining ◽

Simulated Annealing ◽

Simulated Annealing Algorithm ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Annealing Algorithm ◽

Outlier Data ◽

Fast Decomposition

Download Full-text

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Health Informatics - Nursing Informatics ◽

10.1007/978-1-4757-3252-8_10 ◽

2000 ◽

pp. 139-148 ◽

Cited By ~ 2

Author(s):

Patricia A. Abbott

Keyword(s):

Data Mining ◽

Health Care ◽

Knowledge Discovery ◽

Large Data ◽

Large Data Sets ◽

Data Sets

Download Full-text

Improving Rule Induction Precision for Automated Annotation by Balancing Skewed Data Sets

Knowledge Exploration in Life Science Informatics - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30478-4_3 ◽

2004 ◽

pp. 20-32 ◽

Cited By ~ 5

Author(s):

Gustavo E. A. P. A. Batista ◽

Maria C. Monard ◽

Ana L. C. Bazzan

Keyword(s):

Rule Induction ◽

Data Sets ◽

Skewed Data ◽

Automated Annotation

Download Full-text

Data Mining Based Intelligent System for Voting Behavior Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.3070 ◽

2013 ◽

Vol 284-287 ◽

pp. 3070-3073

Author(s):

Duen Kai Chen

Keyword(s):

Data Mining ◽

Behavior Analysis ◽

Voting Behavior ◽

Intelligent System ◽

Data Sets ◽

Tree Model ◽

Mining Technology ◽

Identification Rate ◽

Voter Identification ◽

Election Studies

In this study, we report a voting behavior analysis intelligent system based on data mining technology. From previous literature, we have witnessed increasing number of studies applied information technology to facilitate voting behavior analysis. In this study, we built a likely voter identification model through the use of data mining technology, the classification algorithm used here constructs decision tree model to identify voters and non voters. This model is evaluated by its accuracy and number of attributes used to correctly identify likely voter. Our goal is to try to use just a small number of survey questions while maintaining the accuracy rates of other similar models. This model was built and tested on Taiwan’s Election and Democratization Study (TEDS) data sets. According to the experimental results, the proposed model can improve likely voter identification rate and this finding is consistent with previous studies based on American National Election Studies.

Download Full-text

Stamping Plant 4.0 – Basics for the Application of Data Mining Methods in Manufacturing Car Body Parts

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.639.21 ◽

2015 ◽

Vol 639 ◽

pp. 21-30 ◽

Cited By ~ 7

Author(s):

Stephan Purr ◽

Josef Meinhardt ◽

Arnulf Lipp ◽

Axel Werner ◽

Martin Ostermair ◽

...

Keyword(s):

Data Mining ◽

Data Acquisition ◽

Data Driven ◽

Quality Analysis ◽

Process Conditions ◽

Data Sets ◽

Body Parts ◽

Car Body ◽

Sample Data ◽

Mining Methods

Data-driven quality evaluation in the stamping process of car body parts is quite promising because dependencies in the process have not yet been sufficiently researched. However, the application of data mining methods for the process in stamping plants would require a large number of sample data sets. Today, acquiring these data represents a major challenge, because the necessary data are inadequately measured, recorded or stored. Thus, the preconditions for the sample data acquisition must first be created before being able to investigate any correlations. In addition, the process conditions change over time due to wear mechanisms. Therefore, the results do not remain valid and a constant data acquisition is required. In this publication, the current situation in stamping plants regarding the process robustness will be first discussed and the need for data-driven methods will be shown. Subsequently, the state of technology regarding the possibility of collecting the sample data sets for quality analysis in producing car body parts will be researched. At the end of this work, an overview will be provided concerning how this data collection was implemented at BMW as well as what kind of potential can be expected.

Download Full-text

Present State-of-The-Art of Association Rule Mining Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2202.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6398-6405

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

State Of The Art ◽

Synthetic Data ◽

Data Sets ◽

Evolutionary Analysis ◽

Rule Mining ◽

Transaction Database ◽

Mining Algorithms

A Data mining is the method of extracting useful information from various repositories such as Relational Database, Transaction database, spatial database, Temporal and Time-series database, Data Warehouses, World Wide Web. Various functionalities of Data mining include Characterization and Discrimination, Classification and prediction, Association Rule Mining, Cluster analysis, Evolutionary analysis. Association Rule mining is one of the most important techniques of Data Mining, that aims at extracting interesting relationships within the data. In this paper we study various Association Rule mining algorithms, also compare them by using synthetic data sets, and we provide the results obtained from the experimental analysis

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

Data mining from extreme data sets: very large and/or very skewed data sets

A Survey on Preparing Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

PCA for heterogeneous data sets in a distributed data mining

Understanding Power System Behavior through Mining Archived Operational Data

Outlier data Mining of large Data Sets relying on fast decomposition simulated annealing algorithm

Knowledge Discovery in Large Data Sets: A Primer for Data Mining Applications in Health Care

Improving Rule Induction Precision for Automated Annotation by Balancing Skewed Data Sets

Data Mining Based Intelligent System for Voting Behavior Analysis

Stamping Plant 4.0 – Basics for the Application of Data Mining Methods in Manufacturing Car Body Parts

Present State-of-The-Art of Association Rule Mining Algorithms

Improved minimum-minimum roughness algorithm for clustering categorical data

Export Citation Format