scholarly journals Analysis of new nosological models from disease similarities using clustering

2020 ◽  
Author(s):  
Lucía Prieto Santamaría ◽  
Eduardo P. García del Valle ◽  
Gerardo Lagunes García ◽  
Massimiliano Zanin ◽  
Alejandro Rodríguez González ◽  
...  

AbstractWhile classical disease nosology is based on phenotypical characteristics, the increasing availability of biological and molecular data is providing new understanding of diseases and their underlying relationships, that could lead to a more comprehensive paradigm for modern medicine. In the present work, similarities between diseases are used to study the generation of new possible disease nosologic models that include both phenotypical and biological information. To this aim, disease similarity is measured in terms of disease feature vectors, that stood for genes, proteins, metabolic pathways and PPIs in the case of biological similarity, and for symptoms in the case of phenotypical similarity. An improvement in similarity computation is proposed, considering weighted instead of Booleans feature vectors. Unsupervised learning methods were applied to these data, specifically, density-based DBSCAN clustering algorithm. As evaluation metric silhouette coefficient was chosen, even though the number of clusters and the number of outliers were also considered. As a results validation, a comparison with randomly distributed data was performed. Results suggest that weighted biological similarities based on proteins, and computed according to cosine index, may provide a good starting point to rearrange disease taxonomy and nosology.

IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 43364-43377
Author(s):  
Xirui Xue ◽  
Shucai Huang ◽  
Jiahao Xie ◽  
Jiashun Ma ◽  
Ning Li

Author(s):  
J. W. Li ◽  
X. Q. Han ◽  
J. W. Jiang ◽  
Y. Hu ◽  
L. Liu

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.


2020 ◽  
Vol 5 ◽  
Author(s):  
Luca Crociani ◽  
Giuseppe Vizzari ◽  
Andrea Gorrini ◽  
Stefania Bandini

Pedestrian behavioural dynamics have been growingly investigated by means of (semi)automated computing techniques for almost two decades, exploiting advancements on computing power, sensor accuracy and availability, computer vision algorithms. This has led to a unique consensus on the existence of significant difference between unidirectional and bidirectional flows of pedestrians, where the phenomenon of lane formation seems to play a major role. The collective behaviour of lane formation emerges in condition of variable density and due to a self-organisation dynamic, for which pedestrians are induced to walk following preceding persons to avoid and minimize conflictual situations. Although the formation of lanes is a well-known phenomenon in this field of study, there is still a lack of methods offering the possibility to provide an (even semi-) automatic identification and a quantitative characterization. In this context, the paper proposes an unsupervised learning approach for an automatic detection of lanes in multi-directional pedestrian flows, based on the DBSCAN clustering algorithm. The reliability of the approach is evaluated through an inter-rater agreement test between the results achieved by a human coder and by the algorithm.


Author(s):  
Yushu Wu ◽  
Fenfen Xie ◽  
Lu Wang ◽  
Shoude Zhang ◽  
Lei Zhang ◽  
...  

The properties of Chinese Herbal Medicine (CHM) are determined to some extent by the properties of their molecular compounds, so it is of great significance to study CHM from the perspective of molecular compounds. In this paper, the clustering algorithm in data mining is used to study the relationship between the properties of CHM and its chemical components. Firstly, the molecular data are collected from the Traditional Chinese Medicine Systems Pharmacology Database and Analysis Platform, and the data set is preprocessed to extract the key molecular descriptors of chemical components. Secondly, the k-means algorithm and the Bisecting k-means algorithm are used to cluster the chemical components based on the CHM molecular descriptors, and the representative molecular features of the cold and hot CHM are extracted; finally, through experimental comparison, it is found that the clustering results obtained by Bisecting k-means algorithm are better. The clustering results show that the average values of molecular composition descriptors and charge descriptors in cold CHM are significantly higher than those in hot CHM. Therefore, the properties of CHM may be affected by molecular structure and molecular charge properties.


2020 ◽  
Author(s):  
Giulia Agostinetto ◽  
Anna Sandionigi ◽  
Adam Chahed ◽  
Alberto Brusati ◽  
Elena Parladori ◽  
...  

AbstractBackgroundThe increasing availability of multi omics data is leading to continually revise estimates of existing biodiversity data. In particular, the molecular data enable to characterize novel species yet unknown and to increase the information linked to those already observed with new genomic data. For this reason, the management and visualization of existing molecular data, and their related metadata, through the implementation of easy to use IT tools have become a key point for the development of future research. The more users are able to access biodiversity related information, the greater the ability of the scientific community to expand the knowledge in this area.ResultsIn our research we have focused on the development of ExTaxsI (Exploring Taxonomies Information), an IT tool able to retrieve biodiversity data stored in NCBI databases and provide a simple and explorable visualization. Through the three case studies presented here, we have shown how an efficient organization of the data already present can lead to obtaining new information that is fundamental as a starting point for new research. Our approach was also able to highlight the limits in the distribution data availability, a key factor to consider in the experimental design phase of broad spectrum studies, such as metagenomics.ConclusionsExTaxI can easily produce explorable visualization of molecular data and its metadata, with the aim to help researchers to improve experimental designs and highlight the main gaps in the coverage of available data.


2018 ◽  
Vol 89 (15) ◽  
pp. 2973-2982 ◽  
Author(s):  
Fengxin Sun ◽  
Mingrui Guo ◽  
Xiaorui Hu ◽  
Lei Wang ◽  
Weidong Gao

Fabrics with good shape-retention properties are strongly expected to improve the aesthetic feeling, comfort and easy-care performance of clothing in daily life, and the efficient characterization of the wrinkle recovery property of fabrics is a necessary approach to facilitate the development of garments with good shape retention. Here, a double extraction method was presented to evaluate fabric wrinkling based on the wrinkling-induced residual force–displacement curves. The correlation analysis was used to determine applicable evaluation indices in order to cluster the wrinkle recovery property of fabrics based on a K-means clustering algorithm. Moreover, subjective judgements were conducted and compared with the objective K-means cluster method. The results show that there is good consistency between objective K-means clustering and subjective judgements, indicating that the indices featured from wrinkling-induced residual force–displacement curves can be used to evaluate the wrinkle recovery of fabrics. Therefore, the double extraction method is a starting point for the rapid identification of wrinkle recovery of fabrics by the mechanical performance of textiles.


2014 ◽  
Vol 62 (8) ◽  
pp. 638 ◽  
Author(s):  
Farrokh Ghahremaninejad ◽  
Mehrshid Riahi ◽  
Melina Babaei ◽  
Faride Attar ◽  
Lütfi Behçet ◽  
...  

Verbascum is one of the main genera of Scrophulariaceae, but delimitation and phylogenetic relationships of this genus are unclear and have not yet been studied using DNA sequences. Here, using four selected molecular markers (nrDNA ITS and the plastid spacers trnS/G, psbA-trnH and trnY/T), we present a phylogeny of Verbascum and test previous infrageneric taxonomic hypotheses as well as its monophyly with respect to Scrophularia. We additionally discuss morphological variation and the utility of morphological characters as predictors of phylogenetic relationships. Our results show that while molecular data unambiguously support the circumscription of Verbascum inferred from morphology, they prove to be of limited utility in resolving infrageneric relationships, suggesting that Verbascum ‘s high species diversity is due to rapid and recent radiation. Our work provides phylogenetic estimation of the genus Verbascum using molecular data and can serve as a starting point for future investigations of Verbascum and relatives.


Sign in / Sign up

Export Citation Format

Share Document