Clustering with Scikit-Learn in Python

Clustering Algorithms ◽

Use Cases ◽

Use Case ◽

Greco Roman ◽

Textual Data ◽

Exploratory Data ◽

Second Use

This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data, adding an invaluable method to your toolbox for exploratory data analysis.

System Design to Utilize Domain Expertise for Visual Exploratory Data Analysis

Information ◽

10.3390/info12040140 ◽

2021 ◽

Vol 12 (4) ◽

pp. 140

Author(s):

Tristan Langer ◽

Tobias Meisen

Keyword(s):

Data Analysis ◽

System Design ◽

Domain Knowledge ◽

User Interaction ◽

Use Cases ◽

Conceptual System ◽

Domain Experts ◽

Domain Expertise ◽

Exploratory data analysis (EDA) is an iterative process where data scientists interact with data to extract information about their quality and shape as well as derive knowledge and new insights into the related domain of the dataset. However, data scientists are rarely experienced domain experts who have tangible knowledge about a domain. Integrating domain knowledge into the analytic process is a complex challenge that usually requires constant communication between data scientists and domain experts. For this reason, it is desirable to reuse the domain insights from exploratory analyses in similar use cases. With this objective in mind, we present a conceptual system design on how to extract domain expertise while performing EDA and utilize it to guide other data scientists in similar use cases. Our system design introduces two concepts, interaction storage and analysis context storage, to record user interaction and interesting data points during an exploratory analysis. For new use cases, it identifies historical interactions from similar use cases and facilitates the recorded data to construct candidate interaction sequences and predict their potential insight—i.e., the insight generated from performing the sequence. Based on these predictions, the system recommends the sequences with the highest predicted insight to data scientist. We implement a prototype to test the general feasibility of our system design and enable further research in this area. Within the prototype, we present an exemplary use case that demonstrates the usefulness of recommended interactions. Finally, we give a critical reflection of our first prototype and discuss research opportunities resulting from our system design.

John Tukey, Exploratory Data Analysis, and Its Possibilities for Participatory Action Research

PsycEXTRA Dataset ◽

10.1037/e567862014-001 ◽

2014 ◽

Author(s):

Brett Stoudt

Keyword(s):

Data Analysis ◽

Action Research ◽

Participatory Action Research ◽

Participatory Action ◽

Graphical Exploratory Data Analysis for Categorical Longitudinal and Time Series Data

PsycEXTRA Dataset ◽

10.1037/e634372013-001 ◽

2013 ◽

Author(s):

Stephen J. Tueller ◽

Richard A. Van Dorn ◽

Georgiy Bobashev ◽

Barry Eggleston

Keyword(s):

Time Series ◽

Data Analysis ◽

Time Series Data ◽

Series Data ◽

Covid-19 Cases in India: A Visual Exploratory Data Analysis Model (Preprint)

10.2196/preprints.24226 ◽

2020 ◽

Cited By ~ 2

Author(s):

Jayesh S

Keyword(s):

Data Analysis ◽

Case Fatality ◽

Public Health Emergency ◽

Virus Spread ◽

Analysis Model ◽

Case Fatality Ratio ◽

First Case ◽

Exploratory Data ◽

The Government

UNSTRUCTURED Covid-19 outbreak was first reported in Wuhan, China. The deadly virus spread not just the disease, but fear around the globe. On January 2020, WHO declared COVID-19 as a Public Health Emergency of International Concern (PHEIC). First case of Covid-19 in India was reported on January 30, 2020. By the time, India was prepared in fighting against the virus. India has taken various measures to tackle the situation. In this paper, an exploratory data analysis of Covid-19 cases in India is carried out. Data namely number of cases, testing done, Case Fatality ratio, Number of deaths, change in visits stringency index and measures taken by the government is used for modelling and visual exploratory data analysis.

Follow The Clicks: Learning and Anticipating Mouse Interactions During Exploratory Data Analysis

Computer Graphics Forum ◽

10.1111/cgf.13670 ◽

2019 ◽

Vol 38 (3) ◽

pp. 41-52 ◽

Cited By ~ 4

Author(s):

Alvitta Ottley ◽

Roman Garnett ◽

Ran Wan

Keyword(s):

Data Analysis ◽

Multivariate Statistical Approach for Nephrines in Women with Obesity

Molecules ◽

10.3390/molecules26051393 ◽

2021 ◽

Vol 26 (5) ◽

pp. 1393

Author(s):

Ralitsa Robeva ◽

Miroslava Nedyalkova ◽

Georgi Kirilov ◽

Atanaska Elenkova ◽

Sabina Zacharieva ◽

...

Keyword(s):

Type 2 Diabetes ◽

Metabolic Syndrome ◽

Data Analysis ◽

Stress Response ◽

Principal Component ◽

Hierarchical Cluster ◽

Obese Women ◽

Catecholamines are physiological regulators of carbohydrate and lipid metabolism during stress, but their chronic influence on metabolic changes in obese patients is still not clarified. The present study aimed to establish the associations between the catecholamine metabolites and metabolic syndrome (MS) components in obese women as well as to reveal the possible hidden subgroups of patients through hierarchical cluster analysis and principal component analysis. The 24-h urine excretion of metanephrine and normetanephrine was investigated in 150 obese women (54 non diabetic without MS, 70 non-diabetic with MS and 26 with type 2 diabetes). The interrelations between carbohydrate disturbances, metabolic syndrome components and stress response hormones were studied. Exploratory data analysis was used to determine different patterns of similarities among the patients. Normetanephrine concentrations were significantly increased in postmenopausal patients and in women with morbid obesity, type 2 diabetes, and hypertension but not with prediabetes. Both metanephrine and normetanephrine levels were positively associated with glucose concentrations one hour after glucose load irrespectively of the insulin levels. The exploratory data analysis showed different risk subgroups among the investigated obese women. The development of predictive tools that include not only traditional metabolic risk factors, but also markers of stress response systems might help for specific risk estimation in obesity patients.