Facioscapulohumeral Muscular Dystrophy Diagnosis Using Hierarchical Clustering Algorithm and K-Nearest Neighbor Based Methodology

Author(s):  
Divya Anand ◽  
Babita Pandey ◽  
Devendra K. Pandey

The genetic diagnosis of neuromuscular disorder is an active area of research. Microarrays are used to detect the changes in genes for the accurate diagnosis. Unfortunately, the number of genes in gene expression data is very large as compared to number of samples. The number of genes needs to be reduced for correct diagnosis. In the present paper, the authors have made an intelligent integrated model for clustering and diagnosis of neuromuscular diseases. Wilcoxon signed rank test is used to preselect the genes. K-means and hierarchical clustering algorithms with different distance metric are employed to cluster the genes. Three classifiers namely linear discriminant analysis, quadratic discriminant analysis and k-nearest neighbor are used. For the employment of integrated techniques, a balanced facioscapulohumeral muscular dystrophy dataset is taken. A comparative analysis of the above integrated algorithms is presented which demonstrate that the integration of cosine distance metric hierarchical clustering algorithm with k-nearest neighbor has given the best performance measures.

2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Yaohui Liu ◽  
Dong Liu ◽  
Fang Yu ◽  
Zhengming Ma

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.


2015 ◽  
Vol 09 (03) ◽  
pp. 307-331 ◽  
Author(s):  
Wei Zhang ◽  
Gongxuan Zhang ◽  
Yongli Wang ◽  
Zhaomeng Zhu ◽  
Tao Li

Nearest neighbor search is a key technique used in hierarchical clustering and its computing complexity decides the performance of the hierarchical clustering algorithm. The time complexity of standard agglomerative hierarchical clustering is O(n3), while the time complexity of more advanced hierarchical clustering algorithms (such as nearest neighbor chain, SLINK and CLINK) is O(n2). This paper presents a new nearest neighbor search method called nearest neighbor boundary (NNB), which first divides a large dataset into independent subset and then finds nearest neighbor of each point in subset. When NNB is used, the time complexity of hierarchical clustering can be reduced to O(n log 2n). Based on NNB, we propose a fast hierarchical clustering algorithm called nearest-neighbor boundary clustering (NBC), and the proposed algorithm can be adapted to the parallel and distributed computing framework. The experimental results demonstrate that our algorithm is practical for large datasets.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2213
Author(s):  
Ahyeong Lee ◽  
Saetbyeol Park ◽  
Jinyoung Yoo ◽  
Jungsook Kang ◽  
Jongguk Lim ◽  
...  

Biofilms formed on the surface of agro-food processing facilities can cause food poisoning by providing an environment in which bacteria can be cultured. Therefore, hygiene management through initial detection is important. This study aimed to assess the feasibility of detecting Escherichia coli (E. coli) and Salmonella typhimurium (S. typhimurium) on the surface of food processing facilities by using fluorescence hyperspectral imaging. E. coli and S. typhimurium were cultured on high-density polyethylene and stainless steel coupons, which are the main materials used in food processing facilities. We obtained fluorescence hyperspectral images for the range of 420–730 nm by emitting UV light from a 365 nm UV light source. The images were used to perform discriminant analyses (linear discriminant analysis, k-nearest neighbor analysis, and partial-least squares discriminant analysis) to identify and classify coupons on which bacteria could be cultured. The discriminant performances of specificity and sensitivity for E. coli (1–4 log CFU·cm−2) and S. typhimurium (1–6 log CFU·cm−2) were over 90% for most machine learning models used, and the highest performances were generally obtained from the k-nearest neighbor (k-NN) model. The application of the learning model to the hyperspectral image confirmed that the biofilm detection was well performed. This result indicates the possibility of rapidly inspecting biofilms using fluorescence hyperspectral images.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


Sign in / Sign up

Export Citation Format

Share Document