scholarly journals A Case Study: Natural Clustering Among Indian States

2018 ◽  
Vol 6 (2) ◽  
pp. 3-8
Author(s):  
Varghese F

India with a population of 1.34 billion stands as the second populous country in the world. In India about 51 births takes place in a minute. Child health plays a vital role in the development of a country. Health of the population significantly affects both social development and economic progress. Given the relevance of health for human well-being and social welfare, it is important to ensure equitable access to health care services by identifying priority areas and ensuring improvements in quality of healthcare services. Recent studies had reflected that the neighbourhood plays a crucial role in the health status. Socio- economic status of the neighbourhood has linked with the mortality, general health status, disability, birth-rate, chronic condition, health behavior and other risk factors for chronic disease, as well as mental health, injuries, violence’s and other indicators of health [4]. This study aims to determine whether on the basis of maternal and children health status, there could be any natural clustering among the different districts of India. K mean clustering was used to find the number of clusters among Indian states. According to the majority rule, 2 would be the best number of clusters in the data set. In fact, 10 among 27 indices select 2 as the optimal number of cluster. Hence, the majority rule seems to be a more reliable solution for selecting the best number of clusters. Hence the different districts are grouped together to form two natural clusters. This implies that the health status of children in these district are interdependent. Not only the factors within one district are responsible for the health status of the children, there is also a great influence from the neighbouring districts. In order to mould a better future generation, the focuses should be made in the entire country.

2020 ◽  
Vol 11 (3) ◽  
pp. 42-67
Author(s):  
Soumeya Zerabi ◽  
Souham Meshoul ◽  
Samia Chikhi Boucherkha

Cluster validation aims to both evaluate the results of clustering algorithms and predict the number of clusters. It is usually achieved using several indexes. Traditional internal clustering validation indexes (CVIs) are mainly based in computing pairwise distances which results in a quadratic complexity of the related algorithms. The existing CVIs cannot handle large data sets properly and need to be revisited to take account of the ever-increasing data set volume. Therefore, design of parallel and distributed solutions to implement these indexes is required. To cope with this issue, the authors propose two parallel and distributed models for internal CVIs namely for Silhouette and Dunn indexes using MapReduce framework under Hadoop. The proposed models termed as MR_Silhouette and MR_Dunn have been tested to solve both the issue of evaluating the clustering results and identifying the optimal number of clusters. The results of experimental study are very promising and show that the proposed parallel and distributed models achieve the expected tasks successfully.


2020 ◽  
Vol 10 (6) ◽  
pp. 1401-1407
Author(s):  
Hyungtai Kim ◽  
Minhee Lee ◽  
Min Kyun Sohn ◽  
Jongmin Lee ◽  
Deog Yung Kim ◽  
...  

This paper shows the simultaneous clustering and classification that is done in order to discover internal grouping on an unlabeled data set. Moreover, it simultaneously classifies the data using clusters discovered as class labels. During the simultaneous clustering and classification, silhouette and F1 scores were calculated for clustering and classification, respectively, according to the number of clusters in order to find an optimal number of clusters that guarantee the desired level of classification performance. In this study, we applied this approach to the data set of Ischemic stroke patients in order to discover function recovery patterns where clear diagnoses do not exist. In addition, we have developed a classifier that predicts the type of function recovery for new patients with early clinical test scores in clinically meaningful levels of accuracy. This classifier can be a helpful tool for clinicians in the rehabilitation field.


2021 ◽  
Vol 6 (1) ◽  
pp. 41
Author(s):  
I Kadek Dwi Gandika Supartha ◽  
Adi Panca Saputra Iskandar

In this study, clustering data on STMIK STIKOM Indonesia alumni using the Fuzzy C-Means and Fuzzy Subtractive methods. The method used to test the validity of the cluster is the Modified Partition Coefficient (MPC) and Classification Entropy (CE) index. Clustering is carried out with the aim of finding hidden patterns or information from a fairly large data set, considering that so far the alumni data at STMIK STIKOM Indonesia have not undergone a data mining process. The results of measuring cluster validity using the Modified Partition Coefficient (MPC) and Classification Entropy (CE) index, the Fuzzy C-Means Clustering algorithm has a higher level of validity than the Fuzzy Subtractive Clustering algorithm so it can be said that the Fuzzy C-Means algorithm performs the cluster process better than with the Fuzzy Subtractive method in clustering alumni data. The number of clusters that have the best fitness value / the most optimal number of clusters based on the CE and MPC validity index is 5 clusters. The cluster that has the best characteristics is the 1st cluster which has 514 members (36.82% of the total alumni). With the characteristics of having an average GPA of 3.3617, the average study period is 7.8102 semesters and an average TA work period of 4.9596 months.


2021 ◽  
Vol 9 (1) ◽  
pp. e001889
Author(s):  
Rodrigo M Carrillo-Larco ◽  
Manuel Castillo-Cara ◽  
Cecilia Anza-Ramirez ◽  
Antonio Bernabé-Ortiz

IntroductionWe aimed to identify clusters of people with type 2 diabetes mellitus (T2DM) and to assess whether the frequency of these clusters was consistent across selected countries in Latin America and the Caribbean (LAC).Research design and methodsWe analyzed 13 population-based national surveys in nine countries (n=8361). We used k-means to develop a clustering model; predictors were age, sex, body mass index (BMI), waist circumference (WC), systolic/diastolic blood pressure (SBP/DBP), and T2DM family history. The training data set included all surveys, and the clusters were then predicted in each country-year data set. We used Euclidean distance, elbow and silhouette plots to select the optimal number of clusters and described each cluster according to the underlying predictors (mean and proportions).ResultsThe optimal number of clusters was 4. Cluster 0 grouped more men and those with the highest mean SBP/DBP. Cluster 1 had the highest mean BMI and WC, as well as the largest proportion of T2DM family history. We observed the smallest values of all predictors in cluster 2. Cluster 3 had the highest mean age. When we reflected the four clusters in each country-year data set, a different distribution was observed. For example, cluster 3 was the most frequent in the training data set, and so it was in 7 out of 13 other country-year data sets.ConclusionsUsing unsupervised machine learning algorithms, it was possible to cluster people with T2DM from the general population in LAC; clusters showed unique profiles that could be used to identify the underlying characteristics of the T2DM population in LAC.


2020 ◽  
pp. 089976402093777
Author(s):  
Stefania Capecchi ◽  
Francesca Di Iorio ◽  
Nunzia Nappo

The effects of voluntary activities on individual well-being have been investigated extensively in the literature. In this study, the relationship between self-assessed health and volunteering is examined from a cross-country perspective by considering respondents’ characteristics and other voluntary liabilities, employing the Sixth European Working Conditions Survey. This data set allows us to explore, by implementing an Ordered Probit model, the association of self-assessed health status with charity activities performed specifically by workers. Among the working population in the European Union, our results show that, although volunteering—as well as other unpaid tasks, such as informal helping—are statistically significant, voluntary activities do not seem to be strongly associated with individual perceived health status.


Author(s):  
M. Arif Wani ◽  
Romana Riyaz

Purpose – The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities. The purpose of this paper is to propose a new cluster validity index (ARSD index) that works well on all types of data sets. Design/methodology/approach – The authors introduce a new compactness measure that depicts the typical behaviour of a cluster where more points are located around the centre and lesser points towards the outer edge of the cluster. A novel penalty function is proposed for determining the distinctness measure of clusters. Random linear search-algorithm is employed to evaluate and compare the performance of the five commonly known validity indices and the proposed validity index. The values of the six indices are computed for all nc ranging from (nc min, nc max) to obtain the optimal number of clusters present in a data set. The data sets used in the experiments include shaped, Gaussian-like and real data sets. Findings – Through extensive experimental study, it is observed that the proposed validity index is found to be more consistent and reliable in indicating the correct number of clusters compared to other validity indices. This is experimentally demonstrated on 11 data sets where the proposed index has achieved better results. Originality/value – The originality of the research paper includes proposing a novel cluster validity index which is used to determine the optimal number of clusters present in data sets of different complexities.


This research work proposed an integrated approach using Fuzzy Clustering to discover the optimal number of clusters. The proposed technique is a great technological innovation clustering algorithm in marketing and could be used to determine the best group of customers, similar items and products. The new approach can independently determine the initial distribution of cluster centers. The task of finding the number of clusters is converted into the task of determining the size of the neural network, which later translated to identify the optimal groups of clusters. This approach has been tested using four business data set and shows outstanding results compared to traditional approaches. The proposed method is able to find without any significant error the expected exact number of clusters. Further, we believe that this work is a business value to increase market efficiency in finding out what group of clusters is more cost-effective.


2008 ◽  
Vol 41 (4) ◽  
pp. 457-467 ◽  
Author(s):  
G. K. MINI

SummaryKerala State in India is the most advanced in terms of demographic and epidemiological transition and has the highest proportion of elderly population. The study examines the socio-demographic correlates of health status of elderly persons in Kerala in terms of three components: perceived health status, physical mobility and morbidity level. Overall health status was measured by combining the above three components of health status. Data from the 60th National Sample Survey Organization (NSSO) on Condition and Health Care of the Aged in Kerala in 2004 was used for the study. Significant socio-demographic differentials in health status were noted. While women report less morbidity, perceived well-being and physical mobility was better for men. This anomaly can be explained by variations in the components of socio-demographic factors. The findings call for urgent health care strategies for elderly persons in different socio-demographic groups in transitional Indian states like Kerala.


2013 ◽  
Vol 392 ◽  
pp. 803-807 ◽  
Author(s):  
Xue Bo Feng ◽  
Fang Yao ◽  
Zhi Gang Li ◽  
Xiao Jing Yang

According to the number of cluster centers, initial cluster centers, fuzzy factor, iterations and threshold, Fuzzy C-means clustering algorithm (FCM) clusters the data set. FCM will encounter the initialization problem of clustering prototype. Firstly, the article combines the maximum and minimum distance algorithm and K-means algorithm to determine the number of clusters and the initial cluster centers. Secondly, the article determines the optimal number of clusters with Silhouette indicators. Finally, the article improves the convergence rate of FCM by revising membership constantly. The improved FCM has good clustering effect, enhances the optimized capability, and improves the efficiency and effectiveness of the clustering. It has better tightness in the class, scatter among classes and cluster stability and faster convergence rate than the traditional FCM clustering method.


Medicina ◽  
2013 ◽  
Vol 49 (1) ◽  
pp. 5
Author(s):  
Irena Misevičienė ◽  
Loreta Strumylaitė ◽  
Birutė Pajarskienė ◽  
Kristina Žalnieraitienė

Background and Objective. Scientific evidence indicates that patient safety and access to health care is linked to the well-being of health professionals. The self-assessed health status has been widely used as a health measure in different surveys. The aim of this study was to examine and determine the factors related to the self-assessed health status of health professionals. Material and Methods. The cross-sectional questionnaire surveys of nurses and physicians were carried out in randomly selected hospitals. A total of 1025 health professionals (739 nurses and 286 physicians) from 3 hospitals of different size located in 1 geographical region of Lithuania participated in the survey. The response rate among the nurses and the physicians was 89.2% and 52.5%, respectively. The overall response rate was 74.7%. The data on self-assessed health, demographic factors, anthropometric data, blood pressure, cholesterol level in blood, personal history of diseases, smoking, and alcohol consumption were gathered with the help of the questionnaire. Results. About two-thirds (64.1%) of the health professionals reported good or quite good health, and only 1.5% of the respondents reported quite poor or poor health. Multivariate logistic regression analysis revealed that the SAH status of health professionals was dependent on age (odds ratio [OR], 1.03; 95% confidence interval [CI], 1.02–1.05 [Model 1]; OR, 1.04; 95% CI, 1.02–1.06 [Model 2]), diseases (OR, 7.32; 95%, 5.18–10.35), heart diseases (OR, 12.09; 95% CI, 2.9–50.35), hypertension (OR, 2.53; 95% CI, 1.55–4.14), cancer (OR, 6.19; 95% CI, 1.27–30.13), gastrointestinal (OR, 3.54; 95% CI, 1.59–7.86) and musculoskeletal diseases (OR, 3.21; 95% CI, 1.71–6.02), smoking (OR, 2.1; 95% CI, 1.28–3.45 [Model 1]; OR, 2.00; 95% CI, 1.26–3.16 [Model 2]), and occupation (OR, 1.47; 95% CI, 1.04–2.07 [ Model 1]; OR, 1.54; 95% CI, 1.11–2.16 [Model 2]). Conclusions. Diseases are the main predictors of self-assessed health in health professionals. Advancing age and smoking also contribute to poorer self-assessed health.


Sign in / Sign up

Export Citation Format

Share Document