scholarly journals Exploratory data analysis in the context of data mining and resampling.

2010 ◽  
Vol 3 (1) ◽  
pp. 9-22 ◽  
Author(s):  
Chong Ho Yu

Today there are quite a few widespread misconceptions of exploratory data analysis (EDA). One of these misperceptions is that EDA is said to be opposed to statistical modeling. Actually, the essence of EDA is not about putting aside all modeling and preconceptions; rather, researchers are urged not to start the analysis with a strong preconception only, and thus modeling is still legitimate in EDA. In addition, the nature of EDA has been changing due to the emergence of new methods and convergence between EDA and other methodologies, such as data mining and resampling. Therefore, conventional conceptual frameworks of EDA might no longer be capable of coping with this trend. In this article, EDA is introduced in the context of data mining and resampling with an emphasis on three goals: cluster detection, variable selection, and pattern recognition. TwoStep clustering, classification trees, and neural networks, which are powerful techniques to accomplish the preceding goals, respectively, are illustrated with concrete examples.

Author(s):  
Brian D. Haig

Chapter 2 is concerned with modern data analysis. It focuses primarily on the nature, role, and importance of exploratory data analysis, although it gives some attention to computer-intensive resampling methods. Exploratory data analysis is a process in which data are examined to reveal potential patterns of interest. However, the use of traditional confirmatory methods in data analysis remains the dominant practice. Different perspectives on data analysis, as they are shaped by four different accounts of scientific method, are provided. A brief discussion of John Tukey’s philosophy of teaching data analysis is presented. The chapter does not consider the more recent exploratory data analytic developments, such as the practice of statistical modeling, the employment of data-mining techniques, and more flexible resampling methods.


Author(s):  
Jörg Andreas Walter

For many tasks of exploratory data analysis, visualization plays an important role. It is a key for efficient integration of human expertise — not only to include his background knowledge, intuition and creativity, but also his powerful pattern recognition and processing capabilities. The design goals for an optimal user interaction strongly depend on the given visualization task, but they certainly include an easy and intuitive navigation with strong support for the user’s orientation.


2021 ◽  
Author(s):  
Jack B. Greisman ◽  
Kevin M. Dalton ◽  
Doeke R. Hekstra

AbstractX-ray crystallography is an invaluable technique for studying the atomic structure of macromolecules. Much of crystallography’s success is due to the software packages developed to enable the automated processing of diffraction data. However, the analysis of unconventional diffraction experiments can still pose significant challenges—many existing programs are closed-source, sparsely documented, or are challenging to integrate with modern libraries for scientific computing and machine learning. Here we describe reciprocalspaceship, a Python library for exploring reciprocal space. It provides a tabular representation for reflection data from diffraction experiments that extends the widely-used pandas library with built-in methods for handling space group, unit cell, and symmetry-based operations. As we illustrate, this library facilitates new modes of exploratory data analysis while supporting the prototyping, development, and release of new methods.


Sign in / Sign up

Export Citation Format

Share Document