scholarly journals Clustering with Scikit-Learn in Python

2021 ◽  
Author(s):  
Thomas Jurczyk

This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data, adding an invaluable method to your toolbox for exploratory data analysis.

Information ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 140
Author(s):  
Tristan Langer ◽  
Tobias Meisen

Exploratory data analysis (EDA) is an iterative process where data scientists interact with data to extract information about their quality and shape as well as derive knowledge and new insights into the related domain of the dataset. However, data scientists are rarely experienced domain experts who have tangible knowledge about a domain. Integrating domain knowledge into the analytic process is a complex challenge that usually requires constant communication between data scientists and domain experts. For this reason, it is desirable to reuse the domain insights from exploratory analyses in similar use cases. With this objective in mind, we present a conceptual system design on how to extract domain expertise while performing EDA and utilize it to guide other data scientists in similar use cases. Our system design introduces two concepts, interaction storage and analysis context storage, to record user interaction and interesting data points during an exploratory analysis. For new use cases, it identifies historical interactions from similar use cases and facilitates the recorded data to construct candidate interaction sequences and predict their potential insight—i.e., the insight generated from performing the sequence. Based on these predictions, the system recommends the sequences with the highest predicted insight to data scientist. We implement a prototype to test the general feasibility of our system design and enable further research in this area. Within the prototype, we present an exemplary use case that demonstrates the usefulness of recommended interactions. Finally, we give a critical reflection of our first prototype and discuss research opportunities resulting from our system design.


2013 ◽  
Author(s):  
Stephen J. Tueller ◽  
Richard A. Van Dorn ◽  
Georgiy Bobashev ◽  
Barry Eggleston

Author(s):  
Jayesh S

UNSTRUCTURED Covid-19 outbreak was first reported in Wuhan, China. The deadly virus spread not just the disease, but fear around the globe. On January 2020, WHO declared COVID-19 as a Public Health Emergency of International Concern (PHEIC). First case of Covid-19 in India was reported on January 30, 2020. By the time, India was prepared in fighting against the virus. India has taken various measures to tackle the situation. In this paper, an exploratory data analysis of Covid-19 cases in India is carried out. Data namely number of cases, testing done, Case Fatality ratio, Number of deaths, change in visits stringency index and measures taken by the government is used for modelling and visual exploratory data analysis.


Molecules ◽  
2021 ◽  
Vol 26 (5) ◽  
pp. 1393
Author(s):  
Ralitsa Robeva ◽  
Miroslava Nedyalkova ◽  
Georgi Kirilov ◽  
Atanaska Elenkova ◽  
Sabina Zacharieva ◽  
...  

Catecholamines are physiological regulators of carbohydrate and lipid metabolism during stress, but their chronic influence on metabolic changes in obese patients is still not clarified. The present study aimed to establish the associations between the catecholamine metabolites and metabolic syndrome (MS) components in obese women as well as to reveal the possible hidden subgroups of patients through hierarchical cluster analysis and principal component analysis. The 24-h urine excretion of metanephrine and normetanephrine was investigated in 150 obese women (54 non diabetic without MS, 70 non-diabetic with MS and 26 with type 2 diabetes). The interrelations between carbohydrate disturbances, metabolic syndrome components and stress response hormones were studied. Exploratory data analysis was used to determine different patterns of similarities among the patients. Normetanephrine concentrations were significantly increased in postmenopausal patients and in women with morbid obesity, type 2 diabetes, and hypertension but not with prediabetes. Both metanephrine and normetanephrine levels were positively associated with glucose concentrations one hour after glucose load irrespectively of the insulin levels. The exploratory data analysis showed different risk subgroups among the investigated obese women. The development of predictive tools that include not only traditional metabolic risk factors, but also markers of stress response systems might help for specific risk estimation in obesity patients.


Sign in / Sign up

Export Citation Format

Share Document