Forming categories in exploratory data analysis and data mining

Author(s):  
P. D. Scott ◽  
R. J. Williams ◽  
K. M. Ho
iBusiness ◽  
2011 ◽  
Vol 03 (04) ◽  
pp. 372-382
Author(s):  
Rosaria Lombardo ◽  
Ermelinda Della Valle

Author(s):  
Brian D. Haig

Chapter 2 is concerned with modern data analysis. It focuses primarily on the nature, role, and importance of exploratory data analysis, although it gives some attention to computer-intensive resampling methods. Exploratory data analysis is a process in which data are examined to reveal potential patterns of interest. However, the use of traditional confirmatory methods in data analysis remains the dominant practice. Different perspectives on data analysis, as they are shaped by four different accounts of scientific method, are provided. A brief discussion of John Tukey’s philosophy of teaching data analysis is presented. The chapter does not consider the more recent exploratory data analytic developments, such as the practice of statistical modeling, the employment of data-mining techniques, and more flexible resampling methods.


Author(s):  
Tom Burr ◽  
S. Tobin

Data mining is a term used to describe various types of exploratory data analysis whose purposes are to select data models, estimate model parameters, and generate hypotheses that can be tested on future data. It is known that model predictions are overly optimistic when generated from the same data that are used to select a model and estimate its parameters. Therefore, most statistical procedures assume that the data model is selected prior to data collection. Alternatively, to adjust for data mining, we describe steps that should be taken to account for “choosing the best” among many candidate models.


2010 ◽  
Vol 3 (1) ◽  
pp. 9-22 ◽  
Author(s):  
Chong Ho Yu

Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples.


Sign in / Sign up

Export Citation Format

Share Document