K-Means Algorithm: An Unsupervised Clustering Approach Using Various Similarity/Dissimilarity Measures

2021 ◽  
pp. 805-813
Author(s):  
Surendra Singh Patel ◽  
Navjot Kumar ◽  
J. Aswathy ◽  
Sai Krishna Vaddadi ◽  
S. A. Akbar ◽  
...  
2007 ◽  
Vol 11 (2) ◽  
pp. 175-188 ◽  
Author(s):  
Simone Garatti ◽  
Sergio Bittanti ◽  
Diego Liberati ◽  
Andrea Maffezzoli

2012 ◽  
Vol 45 (23) ◽  
pp. 50-55 ◽  
Author(s):  
Francesco A. Cuzzola ◽  
Claudio Aurora ◽  
Daniele Sclauzero

2009 ◽  
Vol 17 (03) ◽  
pp. 329-347 ◽  
Author(s):  
HONGJUN YANG ◽  
JIANXIN CHEN ◽  
SHIHUAN TANG ◽  
ZHENKUN LI ◽  
YISONG ZHEN ◽  
...  

Traditional Chinese Medicine (TCM) documented about 100,000 formulae during past 2500 years. To use and customize them by modern pharmaceutical industry, we make an interdisciplinary effort to study the activity of new drug research and development (R&D) in TCM by introducing data mining approaches to it. We used the migraine formulae as a training set to investigate the possibility of developing new prescription by means of data mining. The activity of new drug R&D of TCM consists of two steps. The first step is to discover new prescriptions (candidates for drugs) from migraine formulae. We present an unsupervised clustering approach based on data mining theory to address the problem in the first step and automatically discover ten new prescriptions from the formulae data. The second step is to develop and optimize the prescriptions discovered by current biomedical approaches. Since Ligusticum chuanxiong Hort (LCH), a kind of herb, is often used to treat migraine and appears in the new prescriptions, we use it as an example and apply supervised regression method based on data mining theory to study the drug R&D activity of TCM. We revised two linear regression methods in order to establish the nonlinear association between three chemical ingredients of LCH and corresponding pharmacological activity and used it to predict the activities. The association is validated by in vitro experiments and we found that the experimental results are consistent with the prediction. Unsupervised clustering and supervised regression cover most part of data mining theory, which means that data mining approaches play a crucial role in new drug R&D in TCM and present a better solution to establish the platform of drug R&D in TCM.


2019 ◽  
Author(s):  
Hossein Estiri ◽  
Shawn N. Murphy

AbstractBackgroundIdentifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs.ObjectiveThe primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures.MethodsOur approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approach’s, including standard deviation and Mahalanobis distance.ResultsWe found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases.ConclusionOur contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm’s job and initiate necessary actions that need to be taken in order to improve the quality of data.


2013 ◽  
Vol 13 (4) ◽  
pp. 2017-2036 ◽  
Author(s):  
Khang Siang Tan ◽  
Nor Ashidi Mat Isa ◽  
Wei Hong Lim

Sign in / Sign up

Export Citation Format

Share Document