Simultaneous Clustering and Classification of Function Recovery Patterns of Ischemic Stroke

2020 ◽  
Vol 10 (6) ◽  
pp. 1401-1407
Author(s):  
Hyungtai Kim ◽  
Minhee Lee ◽  
Min Kyun Sohn ◽  
Jongmin Lee ◽  
Deog Yung Kim ◽  
...  

This paper shows the simultaneous clustering and classification that is done in order to discover internal grouping on an unlabeled data set. Moreover, it simultaneously classifies the data using clusters discovered as class labels. During the simultaneous clustering and classification, silhouette and F1 scores were calculated for clustering and classification, respectively, according to the number of clusters in order to find an optimal number of clusters that guarantee the desired level of classification performance. In this study, we applied this approach to the data set of Ischemic stroke patients in order to discover function recovery patterns where clear diagnoses do not exist. In addition, we have developed a classifier that predicts the type of function recovery for new patients with early clinical test scores in clinically meaningful levels of accuracy. This classifier can be a helpful tool for clinicians in the rehabilitation field.

2020 ◽  
Vol 11 (3) ◽  
pp. 42-67
Author(s):  
Soumeya Zerabi ◽  
Souham Meshoul ◽  
Samia Chikhi Boucherkha

Cluster validation aims to both evaluate the results of clustering algorithms and predict the number of clusters. It is usually achieved using several indexes. Traditional internal clustering validation indexes (CVIs) are mainly based in computing pairwise distances which results in a quadratic complexity of the related algorithms. The existing CVIs cannot handle large data sets properly and need to be revisited to take account of the ever-increasing data set volume. Therefore, design of parallel and distributed solutions to implement these indexes is required. To cope with this issue, the authors propose two parallel and distributed models for internal CVIs namely for Silhouette and Dunn indexes using MapReduce framework under Hadoop. The proposed models termed as MR_Silhouette and MR_Dunn have been tested to solve both the issue of evaluating the clustering results and identifying the optimal number of clusters. The results of experimental study are very promising and show that the proposed parallel and distributed models achieve the expected tasks successfully.


2021 ◽  
Vol 6 (1) ◽  
pp. 41
Author(s):  
I Kadek Dwi Gandika Supartha ◽  
Adi Panca Saputra Iskandar

In this study, clustering data on STMIK STIKOM Indonesia alumni using the Fuzzy C-Means and Fuzzy Subtractive methods. The method used to test the validity of the cluster is the Modified Partition Coefficient (MPC) and Classification Entropy (CE) index. Clustering is carried out with the aim of finding hidden patterns or information from a fairly large data set, considering that so far the alumni data at STMIK STIKOM Indonesia have not undergone a data mining process. The results of measuring cluster validity using the Modified Partition Coefficient (MPC) and Classification Entropy (CE) index, the Fuzzy C-Means Clustering algorithm has a higher level of validity than the Fuzzy Subtractive Clustering algorithm so it can be said that the Fuzzy C-Means algorithm performs the cluster process better than with the Fuzzy Subtractive method in clustering alumni data. The number of clusters that have the best fitness value / the most optimal number of clusters based on the CE and MPC validity index is 5 clusters. The cluster that has the best characteristics is the 1st cluster which has 514 members (36.82% of the total alumni). With the characteristics of having an average GPA of 3.3617, the average study period is 7.8102 semesters and an average TA work period of 4.9596 months.


The problem of medical data classification is analyzed and the methods of classification are reviewed in various aspects. However, the efficiency of classification algorithms is still under question. With the motivation to leverage the classification performance, a Class Level disease Convergence and Divergence (CLDC) measure based algorithm is presented in this paper. For any dimension of medical data, it convergence or divergence indicates the support for the disease class. Initially, the data set has been preprocessed to remove the noisy data points. Further, the method estimates disease convergence/divergence measure on different dimensions. The convergence measure is computed based on the frequency of dimensional match where the divergence is estimated based on the dimensional match of other classes. Based on the measures a disease support factor is estimated. The value of disease support has been used to classify the data point and improves the classification performance.


2018 ◽  
Vol 7 (4.5) ◽  
pp. 40
Author(s):  
Sathish Kumar.P.J ◽  
Dr R.Jagadeesh Kan

The problem of high dimensional clustering and classification has been well studied in previous articles. Also, the recommendation generation towards the treatment based on input symptoms has been considered in this research part. Number of approaches has been discussed earlier in literature towards disease prediction and recommendation generation. Still, the efficient of such recommendation systems are not up to noticeable rate. To improve the performance, an efficient multi level symptom similarity based disease prediction and recommendation generation has been presented. The method reads the input data set, performs preprocessing to remove the noisy records. In the second stage, the method performs Class Level Feature Similarity Clustering. The classification of input symptom set has been performed using MLSS (Multi Level Symptom Similarity) measure estimated between different class of samples. According to the selected class, the method selects higher frequent medicine set as recommendation using drug success rate and frequency measures. The proposed method improves the performance of clustering, disease prediction with higher efficient medicine recommendation.  


2021 ◽  
Vol 9 (1) ◽  
pp. e001889
Author(s):  
Rodrigo M Carrillo-Larco ◽  
Manuel Castillo-Cara ◽  
Cecilia Anza-Ramirez ◽  
Antonio Bernabé-Ortiz

IntroductionWe aimed to identify clusters of people with type 2 diabetes mellitus (T2DM) and to assess whether the frequency of these clusters was consistent across selected countries in Latin America and the Caribbean (LAC).Research design and methodsWe analyzed 13 population-based national surveys in nine countries (n=8361). We used k-means to develop a clustering model; predictors were age, sex, body mass index (BMI), waist circumference (WC), systolic/diastolic blood pressure (SBP/DBP), and T2DM family history. The training data set included all surveys, and the clusters were then predicted in each country-year data set. We used Euclidean distance, elbow and silhouette plots to select the optimal number of clusters and described each cluster according to the underlying predictors (mean and proportions).ResultsThe optimal number of clusters was 4. Cluster 0 grouped more men and those with the highest mean SBP/DBP. Cluster 1 had the highest mean BMI and WC, as well as the largest proportion of T2DM family history. We observed the smallest values of all predictors in cluster 2. Cluster 3 had the highest mean age. When we reflected the four clusters in each country-year data set, a different distribution was observed. For example, cluster 3 was the most frequent in the training data set, and so it was in 7 out of 13 other country-year data sets.ConclusionsUsing unsupervised machine learning algorithms, it was possible to cluster people with T2DM from the general population in LAC; clusters showed unique profiles that could be used to identify the underlying characteristics of the T2DM population in LAC.


Author(s):  
M. Arif Wani ◽  
Romana Riyaz

Purpose – The most commonly used approaches for cluster validation are based on indices but the majority of the existing cluster validity indices do not work well on data sets of different complexities. The purpose of this paper is to propose a new cluster validity index (ARSD index) that works well on all types of data sets. Design/methodology/approach – The authors introduce a new compactness measure that depicts the typical behaviour of a cluster where more points are located around the centre and lesser points towards the outer edge of the cluster. A novel penalty function is proposed for determining the distinctness measure of clusters. Random linear search-algorithm is employed to evaluate and compare the performance of the five commonly known validity indices and the proposed validity index. The values of the six indices are computed for all nc ranging from (nc min, nc max) to obtain the optimal number of clusters present in a data set. The data sets used in the experiments include shaped, Gaussian-like and real data sets. Findings – Through extensive experimental study, it is observed that the proposed validity index is found to be more consistent and reliable in indicating the correct number of clusters compared to other validity indices. This is experimentally demonstrated on 11 data sets where the proposed index has achieved better results. Originality/value – The originality of the research paper includes proposing a novel cluster validity index which is used to determine the optimal number of clusters present in data sets of different complexities.


This research work proposed an integrated approach using Fuzzy Clustering to discover the optimal number of clusters. The proposed technique is a great technological innovation clustering algorithm in marketing and could be used to determine the best group of customers, similar items and products. The new approach can independently determine the initial distribution of cluster centers. The task of finding the number of clusters is converted into the task of determining the size of the neural network, which later translated to identify the optimal groups of clusters. This approach has been tested using four business data set and shows outstanding results compared to traditional approaches. The proposed method is able to find without any significant error the expected exact number of clusters. Further, we believe that this work is a business value to increase market efficiency in finding out what group of clusters is more cost-effective.


2021 ◽  
Vol 10 (2) ◽  
pp. 155
Author(s):  
Michael Jacobs ◽  
Ali Arfan ◽  
Alaa Sheta

Diagnosis of brain tumors is one of the most severe medical problems that affect thousands of people each year in the United States. Manual classification of cancerous tumors through examination of MRI images is a difficult task even for trained professionals. It is an error-prone procedure that is dependent on the experience of the radiologist. Brain tumors, in particular, have a high level of complexity.  Therefore, computer-aided diagnosis systems designed to assist with this task are of specific interest for physicians. Accurate detection and classification of brain tumors via magnetic resonance imaging (MRI) examination is a famous approach to analyze MRI images. This paper proposes a method to classify different brain tumors using a Convolutional Neural Network (CNN). We explore the performance of several CNN architectures and examine if decreasing the input image resolution affects the model's accuracy. The dataset used to train the model has initially been 3064 MRI scans. We augmented the data set to 8544 MRI scans to balance the available classes of images. The results show that the design of a suitable CNN architecture can significantly better diagnose medical images. The developed model classification performance was up to 97\% accuracy.


Geophysics ◽  
2013 ◽  
Vol 78 (1) ◽  
pp. E41-E46 ◽  
Author(s):  
Laurens Beran ◽  
Barry Zelt ◽  
Leonard Pasion ◽  
Stephen Billings ◽  
Kevin Kingdon ◽  
...  

We have developed practical strategies for discriminating between buried unexploded ordnance (UXO) and metallic clutter. These methods are applicable to time-domain electromagnetic data acquired with multistatic, multicomponent sensors designed for UXO classification. Each detected target is characterized by dipole polarizabilities estimated via inversion of the observed sensor data. The polarizabilities are intrinsic target features and so are used to distinguish between UXO and clutter. We tested this processing with four data sets from recent field demonstrations, with each data set characterized by metrics of data and model quality. We then developed techniques for building a representative training data set and determined how the variable quality of estimated features affects overall classification performance. Finally, we devised a technique to optimize classification performance by adapting features during target prioritization.


Sign in / Sign up

Export Citation Format

Share Document