Analysis of new nosological models from disease similarities using clustering

AbstractWhile classical disease nosology is based on phenotypical characteristics, the increasing availability of biological and molecular data is providing new understanding of diseases and their underlying relationships, that could lead to a more comprehensive paradigm for modern medicine. In the present work, similarities between diseases are used to study the generation of new possible disease nosologic models that include both phenotypical and biological information. To this aim, disease similarity is measured in terms of disease feature vectors, that stood for genes, proteins, metabolic pathways and PPIs in the case of biological similarity, and for symptoms in the case of phenotypical similarity. An improvement in similarity computation is proposed, considering weighted instead of Booleans feature vectors. Unsupervised learning methods were applied to these data, specifically, density-based DBSCAN clustering algorithm. As evaluation metric silhouette coefficient was chosen, even though the number of clusters and the number of outliers were also considered. As a results validation, a comparison with randomly distributed data was performed. Results suggest that weighted biological similarities based on proteins, and computed according to cosine index, may provide a good starting point to rearrange disease taxonomy and nosology.

Download Full-text

Research on Anomaly Detection Method Based on DBSCAN Clustering Algorithm

2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT) ◽

10.1109/isctt51595.2020.00083 ◽

2020 ◽

Author(s):

Dingsheng Deng

Keyword(s):

Anomaly Detection ◽

Clustering Algorithm ◽

Detection Method ◽

Dbscan Clustering

Download Full-text

Resolvable Cluster Target Tracking Based on the DBSCAN Clustering Algorithm and Labeled RFS

IEEE Access ◽

10.1109/access.2021.3066629 ◽

2021 ◽

Vol 9 ◽

pp. 43364-43377

Author(s):

Xirui Xue ◽

Shucai Huang ◽

Jiahao Xie ◽

Jiashun Ma ◽

Ning Li

Keyword(s):

Target Tracking ◽

Clustering Algorithm ◽

Dbscan Clustering

Download Full-text

AN EFFICIENT CLUSTERING METHOD FOR DBSCAN GEOGRAPHIC SPATIO-TEMPORAL LARGE DATA WITH IMPROVED PARAMETER OPTIMIZATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w10-581-2020 ◽

2020 ◽

Vol XLII-3/W10 ◽

pp. 581-584

Author(s):

J. W. Li ◽

X. Q. Han ◽

J. W. Jiang ◽

Y. Hu ◽

L. Liu

Keyword(s):

Parameter Optimization ◽

Clustering Algorithm ◽

Optimal Solution ◽

Large Data ◽

Parameter Selection ◽

Physical Analysis ◽

Clustering Method ◽

K Value ◽

Dbscan Clustering ◽

Spatio Temporal

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.

Download Full-text

Lane Formation Beyond Intuition Towards an Automated Characterization of Lanes in Counter-flows

Collective Dynamics ◽

10.17815/cd.2020.29 ◽

2020 ◽

Vol 5 ◽

Author(s):

Luca Crociani ◽

Giuseppe Vizzari ◽

Andrea Gorrini ◽

Stefania Bandini

Keyword(s):

Clustering Algorithm ◽

Variable Density ◽

Automatic Identification ◽

Computing Power ◽

Dbscan Clustering ◽

Significant Difference ◽

Lane Formation ◽

Human Coder ◽

Behavioural Dynamics

Pedestrian behavioural dynamics have been growingly investigated by means of (semi)automated computing techniques for almost two decades, exploiting advancements on computing power, sensor accuracy and availability, computer vision algorithms. This has led to a unique consensus on the existence of significant difference between unidirectional and bidirectional flows of pedestrians, where the phenomenon of lane formation seems to play a major role. The collective behaviour of lane formation emerges in condition of variable density and due to a self-organisation dynamic, for which pedestrians are induced to walk following preceding persons to avoid and minimize conflictual situations. Although the formation of lanes is a well-known phenomenon in this field of study, there is still a lack of methods offering the possibility to provide an (even semi-) automatic identification and a quantitative characterization. In this context, the paper proposes an unsupervised learning approach for an automatic detection of lanes in multi-directional pedestrian flows, based on the DBSCAN clustering algorithm. The reliability of the approach is evaluated through an inter-rater agreement test between the results achieved by a human coder and by the algorithm.

Download Full-text

Research on Characteristics of Chinese Herbal Medicine Compounds Based on Bisecting k-Means Algorithm

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200713 ◽

2020 ◽

Author(s):

Yushu Wu ◽

Fenfen Xie ◽

Lu Wang ◽

Shoude Zhang ◽

Lei Zhang ◽

...

Keyword(s):

Herbal Medicine ◽

Chinese Herbal Medicine ◽

Clustering Algorithm ◽

Molecular Descriptors ◽

Molecular Data ◽

Experimental Comparison ◽

Chemical Components ◽

Chinese Herbal ◽

Molecular Features ◽

Molecular Compounds

The properties of Chinese Herbal Medicine (CHM) are determined to some extent by the properties of their molecular compounds, so it is of great significance to study CHM from the perspective of molecular compounds. In this paper, the clustering algorithm in data mining is used to study the relationship between the properties of CHM and its chemical components. Firstly, the molecular data are collected from the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform, and the data set is preprocessed to extract the key molecular descriptors of chemical components. Secondly, the k-means algorithm and the Bisecting k-means algorithm are used to cluster the chemical components based on the CHM molecular descriptors, and the representative molecular features of the cold and hot CHM are extracted; finally, through experimental comparison, it is found that the clustering results obtained by Bisecting k-means algorithm are better. The clustering results show that the average values of molecular composition descriptors and charge descriptors in cold CHM are significantly higher than those in hot CHM. Therefore, the properties of CHM may be affected by molecular structure and molecular charge properties.

Download Full-text

ExTaxsI: an exploration tool of biodiversity molecular data

10.1101/2020.11.05.369983 ◽

2020 ◽

Author(s):

Giulia Agostinetto ◽

Anna Sandionigi ◽

Adam Chahed ◽

Alberto Brusati ◽

Elena Parladori ◽

...

Keyword(s):

Molecular Data ◽

Data Availability ◽

Future Research ◽

Distribution Data ◽

Biodiversity Data ◽

Related Information ◽

New Information ◽

Starting Point ◽

Key Factor ◽

New Research

AbstractBackgroundThe increasing availability of multi omics data is leading to continually revise estimates of existing biodiversity data. In particular, the molecular data enable to characterize novel species yet unknown and to increase the information linked to those already observed with new genomic data. For this reason, the management and visualization of existing molecular data, and their related metadata, through the implementation of easy to use IT tools have become a key point for the development of future research. The more users are able to access biodiversity related information, the greater the ability of the scientific community to expand the knowledge in this area.ResultsIn our research we have focused on the development of ExTaxsI (Exploring Taxonomies Information), an IT tool able to retrieve biodiversity data stored in NCBI databases and provide a simple and explorable visualization. Through the three case studies presented here, we have shown how an efficient organization of the data already present can lead to obtaining new information that is fundamental as a starting point for new research. Our approach was also able to highlight the limits in the distribution data availability, a key factor to consider in the experimental design phase of broad spectrum studies, such as metagenomics.ConclusionsExTaxI can easily produce explorable visualization of molecular data and its metadata, with the aim to help researchers to improve experimental designs and highlight the main gaps in the coverage of available data.

Download Full-text

Analysis of curve parameters to characterize multidirectional fabric wrinkling by a double extraction method

Textile Research Journal ◽

10.1177/0040517518805372 ◽

2018 ◽

Vol 89 (15) ◽

pp. 2973-2982 ◽

Cited By ~ 3

Author(s):

Fengxin Sun ◽

Mingrui Guo ◽

Xiaorui Hu ◽

Lei Wang ◽

Weidong Gao

Keyword(s):

Extraction Method ◽

Clustering Algorithm ◽

Mechanical Performance ◽

Cluster Method ◽

Good Shape ◽

Starting Point ◽

Double Extraction ◽

Shape Retention ◽

Residual Force ◽

Force Displacement

Fabrics with good shape-retention properties are strongly expected to improve the aesthetic feeling, comfort and easy-care performance of clothing in daily life, and the efficient characterization of the wrinkle recovery property of fabrics is a necessary approach to facilitate the development of garments with good shape retention. Here, a double extraction method was presented to evaluate fabric wrinkling based on the wrinkling-induced residual force–displacement curves. The correlation analysis was used to determine applicable evaluation indices in order to cluster the wrinkle recovery property of fabrics based on a K-means clustering algorithm. Moreover, subjective judgements were conducted and compared with the objective K-means cluster method. The results show that there is good consistency between objective K-means clustering and subjective judgements, indicating that the indices featured from wrinkling-induced residual force–displacement curves can be used to evaluate the wrinkle recovery of fabrics. Therefore, the double extraction method is a starting point for the rapid identification of wrinkle recovery of fabrics by the mechanical performance of textiles.

Download Full-text

An Efficient and Adaptive Method for Collision Probability of Ships, Icebergs Using CNN and DBSCAN Clustering Algorithm

Emerging Technologies in Computer Engineering: Microservices in Big Data Analytics - Communications in Computer and Information Science ◽

10.1007/978-981-13-8300-7_3 ◽

2019 ◽

pp. 20-33

Author(s):

Syed Zishan Ali ◽

Monica Makhija ◽

Daljeet Choudhary ◽

Hitesh Singh

Keyword(s):

Clustering Algorithm ◽

Adaptive Method ◽

Collision Probability ◽

Dbscan Clustering

Download Full-text

Monophyly of Verbascum (Scrophularieae : Scrophulariaceae): evidence from nuclear and plastid phylogenetic analyses

Australian Journal of Botany ◽

10.1071/bt14159 ◽

2014 ◽

Vol 62 (8) ◽

pp. 638 ◽

Cited By ~ 2

Author(s):

Farrokh Ghahremaninejad ◽

Mehrshid Riahi ◽

Melina Babaei ◽

Faride Attar ◽

Lütfi Behçet ◽

...

Keyword(s):

Molecular Markers ◽

Dna Sequences ◽

Phylogenetic Relationships ◽

Phylogenetic Analyses ◽

Morphological Characters ◽

Molecular Data ◽

Nrdna Its ◽

Starting Point ◽

High Species Diversity ◽

Phylogenetic Estimation

Verbascum is one of the main genera of Scrophulariaceae, but delimitation and phylogenetic relationships of this genus are unclear and have not yet been studied using DNA sequences. Here, using four selected molecular markers (nrDNA ITS and the plastid spacers trnS/G, psbA-trnH and trnY/T), we present a phylogeny of Verbascum and test previous infrageneric taxonomic hypotheses as well as its monophyly with respect to Scrophularia. We additionally discuss morphological variation and the utility of morphological characters as predictors of phylogenetic relationships. Our results show that while molecular data unambiguously support the circumscription of Verbascum inferred from morphology, they prove to be of limited utility in resolving infrageneric relationships, suggesting that Verbascum ‘s high species diversity is due to rapid and recent radiation. Our work provides phylogenetic estimation of the genus Verbascum using molecular data and can serve as a starting point for future investigations of Verbascum and relatives.

Download Full-text

Membership determination of open cluster NGC 188 based on the DBSCAN clustering algorithm

Research in Astronomy and Astrophysics ◽

10.1088/1674-4527/14/2/004 ◽

2014 ◽

Vol 14 (2) ◽

pp. 159-164 ◽

Cited By ~ 9

Author(s):

Xin-Hua Gao

Keyword(s):

Clustering Algorithm ◽

Open Cluster ◽

Dbscan Clustering

Download Full-text