Exploratory data analysis in the context of data mining and resampling.

Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples.

Download Full-text

Exploratory Data Analysis

10.1093/oso/9780190222055.003.0002 ◽

2018 ◽

Author(s):

Brian D. Haig

Keyword(s):

Data Mining ◽

Data Analysis ◽

Statistical Modeling ◽

Scientific Method ◽

Exploratory Data Analysis ◽

Resampling Methods ◽

Data Mining Techniques ◽

Exploratory Data ◽

Data Analytic

Chapter 2 is concerned with modern data analysis. It focuses primarily on the nature, role, and importance of exploratory data analysis, although it gives some attention to computer-intensive resampling methods. Exploratory data analysis is a process in which data are examined to reveal potential patterns of interest. However, the use of traditional confirmatory methods in data analysis remains the dominant practice. Different perspectives on data analysis, as they are shaped by four different accounts of scientific method, are provided. A brief discussion of John Tukey’s philosophy of teaching data analysis is presented. The chapter does not consider the more recent exploratory data analytic developments, such as the practice of statistical modeling, the employment of data-mining techniques, and more flexible resampling methods.

Download Full-text

The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE)

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.1c00244 ◽

2021 ◽

Vol 61 (9) ◽

pp. 4173-4189

Author(s):

Tahereh G. Avval ◽

Behnam Moeini ◽

Victoria Carver ◽

Neal Fairley ◽

Emily F. Smith ◽

...

Keyword(s):

Pattern Recognition ◽

Data Analysis ◽

Exploratory Data Analysis ◽

Summary Statistics ◽

Exploratory Data

Download Full-text

Hyperbolic Space for Interactive Visualization

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch109 ◽

2011 ◽

pp. 575-581

Author(s):

Jörg Andreas Walter

Keyword(s):

Pattern Recognition ◽

Data Analysis ◽

Hyperbolic Space ◽

Exploratory Data Analysis ◽

Strong Support ◽

Interactive Visualization ◽

User Interaction ◽

Efficient Integration ◽

Exploratory Data ◽

The Given

For many tasks of exploratory data analysis, visualization plays an important role. It is a key for efficient integration of human expertise — not only to include his background knowledge, intuition and creativity, but also his powerful pattern recognition and processing capabilities. The design goals for an optimal user interaction strongly depend on the given visualization task, but they certainly include an easy and intuitive navigation with strong support for the user’s orientation.

Download Full-text

Reciprocalspaceship: A Python Library for Crystallographic Data Analysis

10.1101/2021.02.03.429617 ◽

2021 ◽

Author(s):

Jack B. Greisman ◽

Kevin M. Dalton ◽

Doeke R. Hekstra

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

X Ray ◽

X Ray Crystallography ◽

New Methods ◽

Reflection Data ◽

Software Packages ◽

Automated Processing ◽

Exploratory Data ◽

Closed Source

AbstractX-ray crystallography is an invaluable technique for studying the atomic structure of macromolecules. Much of crystallography’s success is due to the software packages developed to enable the automated processing of diffraction data. However, the analysis of unconventional diffraction experiments can still pose significant challenges—many existing programs are closed-source, sparsely documented, or are challenging to integrate with modern libraries for scientific computing and machine learning. Here we describe reciprocalspaceship, a Python library for exploring reciprocal space. It provides a tabular representation for reflection data from diffraction experiments that extends the widely-used pandas library with built-in methods for handling space group, unit cell, and symmetry-based operations. As we illustrate, this library facilitates new modes of exploratory data analysis while supporting the prototyping, development, and release of new methods.

Download Full-text