Quantitative and Visual Exploratory Data Analysis for Machine Intelligence

Methodologies and Applications of Computational Statistics for Machine Intelligence - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-7998-7701-1.ch006 ◽

2021 ◽

pp. 97-117

Author(s):

Dharmendra Trikamlal Patel

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Machine Intelligence ◽

Statistical Techniques ◽

Data Sets ◽

Future Trends ◽

Data Set ◽

Conventional View ◽

Exploratory Data ◽

Analyze Data

Exploratory data analysis is a technique to analyze data sets in order to summarize the main characteristics of them using quantitative and visual aspects. The chapter starts with the introduction of exploratory data analysis. It discusses the conventional view of it and describes the main limitations of it. It explores the features of quantitative and visual exploratory data analysis in detail. It deals with the statistical techniques relevant to EDA. It also emphasizes the main visual techniques to represent the data in an efficient way. R has extraordinary capabilities to deal with quantitative and visual aspects to summarize the main characteristics of the data set. The chapter provides the practical exposure of various plotting systems using R. Finally, the chapter deals with current research and future trends of the EDA.

Download Full-text

Techniques for Exploring and Presenting Data Applied to Lake Phosphorus Concentration

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f80-038 ◽

1980 ◽

Vol 37 (2) ◽

pp. 290-294 ◽

Cited By ~ 18

Author(s):

K. H. Reckhow

Keyword(s):

Water Quality ◽

Data Analysis ◽

Information Transfer ◽

Exploratory Data Analysis ◽

Statistical Significance ◽

Phosphorus Concentration ◽

Data Set ◽

Exploratory Data ◽

Water Quality Models ◽

Single Graph

Water quality sampling and data analysis are undertaken to acquire and convey information. Therefore, when data are presented, the form of this presentation should be such that information transfer is high. For example, a graph or table of average values is often an inadequate summary of batches of data. As an alternative, a technique is presented (that was developed for exploratory data analysis purposes) that can be used to display several sets of data on a single graph, indicating median, spread, skew, size of data set, and statistical significance of the median. This technique is useful in the study of phosphorus concentration variability in lakes. Additions to, and modifications of, this procedure are easily made and will often enhance the analysis of a particular problem. Some suggestions are made for useful modifications of the plots in the study and display of phosphorus lake data and models.Key words: limnology, exploratory data analysis, statistics, phosphorus, water quality, models, lakes

Download Full-text

Box Plots: Basic and Advanced

Mathematics Teacher ◽

10.5951/mt.83.2.0108 ◽

1990 ◽

Vol 83 (2) ◽

pp. 108-112

Author(s):

James L. Mullenex

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Statistical Techniques ◽

Graphical Representations ◽

Box Plots ◽

Exploratory Data

Box plots are used for the purpose of analyzing and displaying important features of sets of data. More specifically, box plots are used as graphical representations of five-number summaries. Box plots and five-number summaries are new statistical techniques that were developed by John W. Tukey of Bell Telephone Laboratories. They are parts of a larger set of modern statistical techniques known collectively as exploratory data analysis, or EDA.

Download Full-text

Exploratory data analysis and clustering of multivariate spatial hydrogeological data by means of GEO3DSOM, a variant of Kohonen’s Self-Organizing Map

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-3-1487-2006 ◽

2006 ◽

Vol 3 (4) ◽

pp. 1487-1516 ◽

Cited By ~ 9

Author(s):

L. Peeters ◽

F. Bação ◽

V. Lobo ◽

A. Dassargues

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Three Dimensional ◽

Spatial Knowledge ◽

Self Organizing Map ◽

Data Set ◽

Hydrochemical Data ◽

Som Algorithm ◽

Exploratory Data ◽

Self Organizing

Abstract. The use of unsupervised artificial neural network techniques like the self-organizing map (SOM) algorithm has proven to be a useful tool in exploratory data analysis and clustering of multivariate data sets. In this study a variant of the SOM-algorithm is proposed, the GEO3DSOM, capable of explicitly incorporating three-dimensional spatial knowledge into the algorithm. The performance of the GEO3DSOM is compared to the performance of the standard SOM in analyzing an artificial data set and a hydrochemical data set. The hydrochemical data set consists of 141 groundwater samples collected in two detritic, phreatic, Cenozoic aquifers in Central Belgium. The standard SOM proves to be more adequate in representing the structure of the data set and to explore relationships between variables. The GEO3DSOM on the other hand performs better in creating spatially coherent groups based on the data.

Download Full-text

Gender and Ultimatum in Pakistan: Revisited

The Pakistan Development Review ◽

10.30541/v53i1pp.1-14 ◽

2014 ◽

Vol 53 (1) ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

Saima Naeem ◽

Asad Zaman

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Ultimatum Game ◽

Original Data ◽

Jel Classification ◽

Data Set ◽

Analysis Techniques ◽

Exploratory Data ◽

Confirmatory Data Analysis

Razzaque (2009) studied the role of gender in the ultimatum game by running experiments on students in various cities in Pakistan. He used standard confirmatory data analysis techniques, which work well in familiar contexts, where relevant hypotheses of interest are known in advance. Our goal in this paper is to demonstrate that exploratory data analysis is much better suited to the study of experimental data where the goal is to discover patterns of interest. Our exploratory re-analysis of the original data set of Razzaque (2009) leads to several new insights. While we re-confirm the main finding of Razzaque regarding the greater generosity of males, additional analysis suggests that this is driven by student subculture in Pakistan, and would not generalise to the population at large. In addition, we find strong effect of urbanisation. Our exploratory data analysis also offers considerable additional insights into the learning process that takes place over the course of a sequence of games. JEL Classification: C78, C81, C91, J16 Keywords: Ultimatum Game, Gender Differences, Exploratory Data Analysis

Download Full-text

Exploratory data analysis and clustering of multivariate spatial hydrogeological data by means of GEO3DSOM, a variant of Kohonen's Self-Organizing Map

Hydrology and Earth System Sciences ◽

10.5194/hess-11-1309-2007 ◽

2007 ◽

Vol 11 (4) ◽

pp. 1309-1321 ◽

Cited By ~ 26

Author(s):

L. Peeters ◽

F. Bação ◽

V. Lobo ◽

A. Dassargues

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Spatial Knowledge ◽

Quality Data ◽

Self Organizing Map ◽

Data Set ◽

Hydrochemical Data ◽

Som Algorithm ◽

Exploratory Data ◽

Self Organizing

Abstract. The use of unsupervised artificial neural network techniques like the self-organizing map (SOM) algorithm has proven to be a useful tool in exploratory data analysis and clustering of multivariate data sets. In this study a variant of the SOM-algorithm is proposed, the GEO3DSOM, capable of explicitly incorporating three-dimensional spatial knowledge into the algorithm. The performance of the GEO3DSOM is compared to the performance of the standard SOM in analyzing an artificial data set and a hydrochemical data set. The hydrochemical data set consists of 131 groundwater samples collected in two detritic, phreatic, Cenozoic aquifers in Central Belgium. Both techniques succeed very well in providing more insight in the groundwater quality data set, visualizing the relationships between variables, highlighting the main differences between groups of samples and pointing out anomalous wells and well screens. The GEO3DSOM however has the advantage to provide an increased resolution while still maintaining a good generalization of the data set.

Download Full-text

A New Look at the Structures of Old Sepsis Actors by Exploratory Data Analysis Tools

Antibiotics ◽

10.3390/antibiotics8040225 ◽

2019 ◽

Vol 8 (4) ◽

pp. 225 ◽

Cited By ~ 2

Author(s):

Antonio Gnoni ◽

Emanuele De Nitto ◽

Salvatore Scacco ◽

Luigi Santacroce ◽

Luigi Leonardo Palese

Keyword(s):

Septic Shock ◽

Data Analysis ◽

Degrees Of Freedom ◽

Exploratory Data Analysis ◽

Protein Structures ◽

Random Projection ◽

Projection Algorithm ◽

Data Sets ◽

Exploratory Data ◽

Sepsis And Septic Shock

Sepsis is a life-threatening condition that accounts for numerous deaths worldwide, usually complications of common community infections (i.e., pneumonia, etc), or infections acquired during the hospital stay. Sepsis and septic shock, its most severe evolution, involve the whole organism, recruiting and producing a lot of molecules, mostly proteins. Proteins are dynamic entities, and a large number of techniques and studies have been devoted to elucidating the relationship between the conformations adopted by proteins and what is their function. Although molecular dynamics has a key role in understanding these relationships, the number of protein structures available in the databases is so high that it is currently possible to build data sets obtained from experimentally determined structures. Techniques for dimensionality reduction and clustering can be applied in exploratory data analysis in order to obtain information on the function of these molecules, and this may be very useful in immunology to better understand the structure-activity relationship of the numerous proteins involved in host defense, moreover in septic patients. The large number of degrees of freedom that characterize the biomolecules requires special techniques which are able to analyze this kind of data sets (with a small number of entries respect to the number of degrees of freedom). In this work we analyzed the ability of two different types of algorithms to provide information on the structures present in three data sets built using the experimental structures of allosteric proteins involved in sepsis. The results obtained by means of a principal component analysis algorithm and those obtained by a random projection algorithm are largely comparable, proving the effectiveness of random projection methods in structural bioinformatics. The usefulness of random projection in exploratory data analysis is discussed, including validation of the obtained clusters. We have chosen these proteins because of their involvement in sepsis and septic shock, aimed to highlight the potentiality of bioinformatics to point out new diagnostic and prognostic tools for the patients.

Download Full-text

Exploratory Data Analysis Using Supervised Learning Techniques on Credit Card Default Data Set in Machine Learning

Lecture Notes in Electrical Engineering - ICDSMLA 2019 ◽

10.1007/978-981-15-1420-3_181 ◽

2020 ◽

pp. 1759-1769

Author(s):

K. Ulaga Priya ◽

S. Pushpa

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Supervised Learning ◽

Credit Card ◽

Exploratory Data Analysis ◽

Data Set ◽

Learning Techniques ◽

Exploratory Data ◽

Credit Card Default ◽

Default Data

Download Full-text

On the application of Exploratory Data Analysis for characterization of space weather data sets

Advances in Space Research ◽

10.1016/j.asr.2011.03.017 ◽

2011 ◽

Vol 47 (12) ◽

pp. 2199-2209 ◽

Cited By ~ 1

Author(s):

L. Habash Krause ◽

A. Franz ◽

A. Stevenson

Keyword(s):

Data Analysis ◽

Space Weather ◽

Exploratory Data Analysis ◽

Weather Data ◽

Data Sets ◽

Exploratory Data

Download Full-text

Extracting activity patterns: Exploratory data analysis on a fucoidan extract data set with mixed variables

Algal Research ◽

10.1016/j.algal.2021.102220 ◽

2021 ◽

Vol 54 ◽

pp. 102220

Author(s):

Signe H. Ptak ◽

Massimiliano Errico ◽

Knud V. Christensen

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Activity Patterns ◽

Data Set ◽

Mixed Variables ◽

Exploratory Data

Download Full-text

R and exploratory data analysis

Statistical Thinking from Scratch ◽

10.1093/oso/9780198827627.003.0003 ◽

2019 ◽

pp. 11-25

Author(s):

M. D. Edge

Keyword(s):

Data Analysis ◽

Software Package ◽

Exploratory Data Analysis ◽

Free Software ◽

Exploratory Data ◽

Analyze Data

R is a powerful, free software package for performing statistical tasks. It will be used to simulate data, analyze data, and make data displays. More details about R are given in Appendix B.

Download Full-text