A Survey of Unsupervised Generative Models for Exploratory Data Analysis and Representation Learning

For more than a century, the methods for data representation and the exploration of the intrinsic structures of data have developed remarkably and consist of supervised and unsupervised methods. However, recent years have witnessed the flourishing of big data, where typical dataset dimensions are high and the data can come in messy, incomplete, unlabeled, or corrupted forms. Consequently, discovering the hidden structure buried inside such data becomes highly challenging. From this perspective, exploratory data analysis plays a substantial role in learning the hidden structures that encompass the significant features of the data in an ordered manner by extracting patterns and testing hypotheses to identify anomalies. Unsupervised generative learning models are a class of machine learning models characterized by their potential to reduce the dimensionality, discover the exploratory factors, and learn representations without any predefined labels; moreover, such models can generate the data from the reduced factors’ domain. The beginner researchers can find in this survey the recent unsupervised generative learning models for the purpose of data exploration and learning representations; specifically, this article covers three families of methods based on their usage in the era of big data: blind source separation, manifold learning, and neural networks, from shallow to deep architectures.

Download Full-text

Extracting Value from Industrial Alarms and Events: A Data-Driven Approach Based on Exploratory Data Analysis

Sensors ◽

10.3390/s19122772 ◽

2019 ◽

Vol 19 (12) ◽

pp. 2772 ◽

Cited By ~ 3

Author(s):

Aguinaldo Bezerra ◽

Ivanovitch Silva ◽

Luiz Affonso Guedes ◽

Diego Silva ◽

Gustavo Leitão ◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Exchange ◽

Data Science ◽

Exploratory Data Analysis ◽

Data Driven ◽

Industrial Data ◽

Industrial Big Data ◽

Data Driven Approach ◽

Exploratory Data

Alarm and event logs are an immense but latent source of knowledge commonly undervalued in industry. Though, the current massive data-exchange, high efficiency and strong competitiveness landscape, boosted by Industry 4.0 and IIoT (Industrial Internet of Things) paradigms, does not accommodate such a data misuse and demands more incisive approaches when analyzing industrial data. Advances in Data Science and Big Data (or more precisely, Industrial Big Data) have been enabling novel approaches in data analysis which can be great allies in extracting hitherto hidden information from plant operation data. Coping with that, this work proposes the use of Exploratory Data Analysis (EDA) as a promising data-driven approach to pave industrial alarm and event analysis. This approach proved to be fully able to increase industrial perception by extracting insights and valuable information from real-world industrial data without making prior assumptions.

Download Full-text

Big Data Analytics: A Spotify Case Study

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38702 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1823-1829

Author(s):

Suraj Ingle

Keyword(s):

Big Data ◽

Data Analysis ◽

Customer Satisfaction ◽

Customer Loyalty ◽

Data Analytics ◽

Exploratory Data Analysis ◽

Big Data Analytics ◽

Exploratory Data ◽

Insight Into

Abstract: By developing products that are in line with consumer needs, anticipating their profitability and manufacturing them, Big Data has opened up a lot of possibilities for building customer loyalty and commercial business by proactively engaging and comprehensively streamlining offers across all customer touch points. The use of big data to determine the best, most efficient ways to engage and interact with their customers will be discussed in this paper. An insight into how Spotify intends to provide music lovers additional ways to find their favourite songs, interact with artists, and improve Spotify recommendations has been provided. Keywords: Big Data, Data Analytics, Customer Satisfaction, Exploratory Data Analysis

Download Full-text

Firework Plot as a Graphical Exploratory Data Analysis Tool for Evaluating the Impact of Outliers in Data Exploration and Regression

Quality and Reliability Engineering International ◽

10.1002/qre.1563 ◽

2013 ◽

Vol 30 (8) ◽

pp. 1409-1425 ◽

Cited By ~ 2

Author(s):

Dae-Heung Jang ◽

Christine M. Anderson-Cook

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Data Exploration ◽

Analysis Tool ◽

Exploratory Data ◽

Data Analysis Tool ◽

The Impact

Download Full-text

Using Exploratory Data Analysis and Big Data Analytics for Detecting Anomalies in Cloud Computing

Journal of Natural Sciences and Engineering ◽

10.14706/jonsae2021320 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Ibrahim Muzaferija ◽

Zerina Mašetić ◽

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Analysis ◽

Anomaly Detection ◽

Data Analytics ◽

Large Scale ◽

Exploratory Data Analysis ◽

Big Data Analytics ◽

Detection Methods ◽

Exploratory Data

While leveraging cloud computing for large-scale distributed applications allows seamless scaling, many companies struggle following up with the amount of data generated in terms of efficient processing and anomaly detection, which is a necessary part of the management of modern applications. As the record of user behavior, weblogs surely become the research item related to anomaly detection. Many anomaly detection methods based on automated log analysis have been proposed. However, not in the context of big data applications where anomalous behavior needs to be detected in understanding phases prior to modeling a system for such use. Big Data Analytics often ignores anomalous point due to high volume of data. To address this problem, we propose a complemented methodology for Big Data Analytics – the Exploratory Data Analysis, which assists in gaining insight into data relationships without the classical hypothesis modeling. In that way, we can gain better understanding of the patterns and spot anomalies. Results show that Exploratory Data Analysis facilitates anomaly detection and the CRISP-DM Business Understanding phase, making it one of the key steps in the Data Understanding phase.

Download Full-text

An exploratory data analysis of airport wait times using big data visualisation techniques

2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS) ◽

10.1109/csitss.2016.7779379 ◽

2016 ◽

Cited By ~ 2

Author(s):

Hari Bhaskar Sankaranarayanan ◽

Gaurav Agarwal ◽

Viral Rathod

Keyword(s):

Big Data ◽

Data Analysis ◽

Exploratory Data Analysis ◽

Wait Times ◽

Data Visualisation ◽

Exploratory Data

Download Full-text

John Tukey, Exploratory Data Analysis, and Its Possibilities for Participatory Action Research

PsycEXTRA Dataset ◽

10.1037/e567862014-001 ◽

2014 ◽

Author(s):

Brett Stoudt

Keyword(s):

Data Analysis ◽

Action Research ◽

Participatory Action Research ◽

Exploratory Data Analysis ◽

Participatory Action ◽

Exploratory Data

Download Full-text

Graphical Exploratory Data Analysis for Categorical Longitudinal and Time Series Data

PsycEXTRA Dataset ◽

10.1037/e634372013-001 ◽

2013 ◽

Author(s):

Stephen J. Tueller ◽

Richard A. Van Dorn ◽

Georgiy Bobashev ◽

Barry Eggleston

Keyword(s):

Time Series ◽

Data Analysis ◽

Exploratory Data Analysis ◽

Time Series Data ◽

Series Data ◽

Exploratory Data

Download Full-text

Covid-19 Cases in India: A Visual Exploratory Data Analysis Model (Preprint)

10.2196/preprints.24226 ◽

2020 ◽

Cited By ~ 2

Author(s):

Jayesh S

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Case Fatality ◽

Public Health Emergency ◽

Virus Spread ◽

Analysis Model ◽

Case Fatality Ratio ◽

First Case ◽

Exploratory Data ◽

The Government

UNSTRUCTURED Covid-19 outbreak was first reported in Wuhan, China. The deadly virus spread not just the disease, but fear around the globe. On January 2020, WHO declared COVID-19 as a Public Health Emergency of International Concern (PHEIC). First case of Covid-19 in India was reported on January 30, 2020. By the time, India was prepared in fighting against the virus. India has taken various measures to tackle the situation. In this paper, an exploratory data analysis of Covid-19 cases in India is carried out. Data namely number of cases, testing done, Case Fatality ratio, Number of deaths, change in visits stringency index and measures taken by the government is used for modelling and visual exploratory data analysis.

Download Full-text

Follow The Clicks: Learning and Anticipating Mouse Interactions During Exploratory Data Analysis

Computer Graphics Forum ◽

10.1111/cgf.13670 ◽

2019 ◽

Vol 38 (3) ◽

pp. 41-52 ◽

Cited By ~ 4

Author(s):

Alvitta Ottley ◽

Roman Garnett ◽

Ran Wan

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Exploratory Data

Download Full-text

Multivariate Statistical Approach for Nephrines in Women with Obesity

Molecules ◽

10.3390/molecules26051393 ◽

2021 ◽

Vol 26 (5) ◽

pp. 1393

Author(s):

Ralitsa Robeva ◽

Miroslava Nedyalkova ◽

Georgi Kirilov ◽

Atanaska Elenkova ◽

Sabina Zacharieva ◽

...

Keyword(s):

Type 2 Diabetes ◽

Metabolic Syndrome ◽

Data Analysis ◽

Stress Response ◽

Exploratory Data Analysis ◽

Principal Component ◽

Hierarchical Cluster ◽

Obese Women ◽

Exploratory Data

Catecholamines are physiological regulators of carbohydrate and lipid metabolism during stress, but their chronic influence on metabolic changes in obese patients is still not clarified. The present study aimed to establish the associations between the catecholamine metabolites and metabolic syndrome (MS) components in obese women as well as to reveal the possible hidden subgroups of patients through hierarchical cluster analysis and principal component analysis. The 24-h urine excretion of metanephrine and normetanephrine was investigated in 150 obese women (54 non diabetic without MS, 70 non-diabetic with MS and 26 with type 2 diabetes). The interrelations between carbohydrate disturbances, metabolic syndrome components and stress response hormones were studied. Exploratory data analysis was used to determine different patterns of similarities among the patients. Normetanephrine concentrations were significantly increased in postmenopausal patients and in women with morbid obesity, type 2 diabetes, and hypertension but not with prediabetes. Both metanephrine and normetanephrine levels were positively associated with glucose concentrations one hour after glucose load irrespectively of the insulin levels. The exploratory data analysis showed different risk subgroups among the investigated obese women. The development of predictive tools that include not only traditional metabolic risk factors, but also markers of stress response systems might help for specific risk estimation in obesity patients.

Download Full-text