Forming categories in exploratory data analysis and data mining

Chapter 2 is concerned with modern data analysis. It focuses primarily on the nature, role, and importance of exploratory data analysis, although it gives some attention to computer-intensive resampling methods. Exploratory data analysis is a process in which data are examined to reveal potential patterns of interest. However, the use of traditional confirmatory methods in data analysis remains the dominant practice. Different perspectives on data analysis, as they are shaped by four different accounts of scientific method, are provided. A brief discussion of John Tukey’s philosophy of teaching data analysis is presented. The chapter does not consider the more recent exploratory data analytic developments, such as the practice of statistical modeling, the employment of data-mining techniques, and more flexible resampling methods.

Download Full-text

Making sense of data: a practical approach to exploratory data analysis and data mining

Choice Reviews Online ◽

10.5860/choice.44-5095 ◽

2007 ◽

Vol 44 (09) ◽

pp. 44-5095-44-5095

Keyword(s):

Data Mining ◽

Data Analysis ◽

Exploratory Data Analysis ◽

Practical Approach ◽

Making Sense ◽

Exploratory Data

Download Full-text

Measuring the Effects of Data Mining on Inference

Encyclopedia of Information Science and Technology, Third Edition ◽

10.4018/978-1-4666-5888-2.ch176 ◽

2015 ◽

pp. 1825-1833 ◽

Cited By ~ 1

Author(s):

Tom Burr ◽

S. Tobin

Keyword(s):

Data Mining ◽

Data Analysis ◽

Data Collection ◽

Data Model ◽

Exploratory Data Analysis ◽

Model Parameters ◽

Statistical Procedures ◽

Model Predictions ◽

Future Data ◽

Exploratory Data

Data mining is a term used to describe various types of exploratory data analysis whose purposes are to select data models, estimate model parameters, and generate hypotheses that can be tested on future data. It is known that model predictions are overly optimistic when generated from the same data that are used to select a model and estimate its parameters. Therefore, most statistical procedures assume that the data model is selected prior to data collection. Alternatively, to adjust for data mining, we describe steps that should be taken to account for “choosing the best” among many candidate models.

Download Full-text

Exploratory data analysis in the context of data mining and resampling.

International journal of psychological research ◽

10.21500/20112084.819 ◽

2010 ◽

Vol 3 (1) ◽

pp. 9-22 ◽

Cited By ~ 11

Author(s):

Chong Ho Yu

Keyword(s):

Data Mining ◽

Neural Networks ◽

Pattern Recognition ◽

Data Analysis ◽

Variable Selection ◽

Statistical Modeling ◽

Exploratory Data Analysis ◽

Cluster Detection ◽

New Methods ◽

Exploratory Data

Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples.

Download Full-text