Data mining from extreme data sets: very large and/or very skewed data sets

Author(s):  
L.O. Hall
Keyword(s):  
Author(s):  
Sarasij Das ◽  
Nagendra Rao P S

This paper is the outcome of an attempt in mining recorded power system operational data in order to get new insight to practical power system behavior. Data mining, in general, is essentially finding new relations between data sets by analyzing well known or recorded data. In this effort we make use of the recorded data of the Southern regional grid of India. Some interesting relations at the total system level between frequency, total MW/MVAr generation, and average system voltage have been obtained. The aim of this work is to highlight the potential of data mining for power system applications and also some of the concerns that need to be addressed to make such efforts more useful.


2013 ◽  
Vol 284-287 ◽  
pp. 3070-3073
Author(s):  
Duen Kai Chen

In this study, we report a voting behavior analysis intelligent system based on data mining technology. From previous literature, we have witnessed increasing number of studies applied information technology to facilitate voting behavior analysis. In this study, we built a likely voter identification model through the use of data mining technology, the classification algorithm used here constructs decision tree model to identify voters and non voters. This model is evaluated by its accuracy and number of attributes used to correctly identify likely voter. Our goal is to try to use just a small number of survey questions while maintaining the accuracy rates of other similar models. This model was built and tested on Taiwan’s Election and Democratization Study (TEDS) data sets. According to the experimental results, the proposed model can improve likely voter identification rate and this finding is consistent with previous studies based on American National Election Studies.


2015 ◽  
Vol 639 ◽  
pp. 21-30 ◽  
Author(s):  
Stephan Purr ◽  
Josef Meinhardt ◽  
Arnulf Lipp ◽  
Axel Werner ◽  
Martin Ostermair ◽  
...  

Data-driven quality evaluation in the stamping process of car body parts is quite promising because dependencies in the process have not yet been sufficiently researched. However, the application of data mining methods for the process in stamping plants would require a large number of sample data sets. Today, acquiring these data represents a major challenge, because the necessary data are inadequately measured, recorded or stored. Thus, the preconditions for the sample data acquisition must first be created before being able to investigate any correlations. In addition, the process conditions change over time due to wear mechanisms. Therefore, the results do not remain valid and a constant data acquisition is required. In this publication, the current situation in stamping plants regarding the process robustness will be first discussed and the need for data-driven methods will be shown. Subsequently, the state of technology regarding the possibility of collecting the sample data sets for quality analysis in producing car body parts will be researched. At the end of this work, an overview will be provided concerning how this data collection was implemented at BMW as well as what kind of potential can be expected.


A Data mining is the method of extracting useful information from various repositories such as Relational Database, Transaction database, spatial database, Temporal and Time-series database, Data Warehouses, World Wide Web. Various functionalities of Data mining include Characterization and Discrimination, Classification and prediction, Association Rule Mining, Cluster analysis, Evolutionary analysis. Association Rule mining is one of the most important techniques of Data Mining, that aims at extracting interesting relationships within the data. In this paper we study various Association Rule mining algorithms, also compare them by using synthetic data sets, and we provide the results obtained from the experimental analysis


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


Sign in / Sign up

Export Citation Format

Share Document