scholarly journals A Hybrid Deep Clustering Approach for Robust Cell Type Profiling Using Single-cell RNA-seq Data

2019 ◽  
Author(s):  
Suhas Srinivasan ◽  
Nathan T. Johnson ◽  
Dmitry Korkin

AbstractSingle-cell RNA sequencing (scRNA-seq) is a recent technology that enables fine-grained discovery of cellular subtypes and specific cell states. It routinely uses machine learning methods, such as feature learning, clustering, and classification, to assist in uncovering novel information from scRNA-seq data. However, current methods are not well suited to deal with the substantial amounts of noise that is created by the experiments or the variation that occurs due to differences in the cells of the same type. Here, we develop a new hybrid approach, Deep Unsupervised Single-cell Clustering (DUSC), that integrates feature generation based on a deep learning architecture with a model-based clustering algorithm, to find a compact and informative representation of the single-cell transcriptomic data generating robust clusters. We also include a technique to estimate an efficient number of latent features in the deep learning model. Our method outperforms both classical and state-of-the-art feature learning and clustering methods, approaching the accuracy of supervised learning. The method is freely available to the community and will hopefully facilitate our understanding of the cellular atlas of living organisms as well as provide the means to improve patient diagnostics and treatment.

Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1763
Author(s):  
Miroslava Nedyalkova ◽  
Costel Sarbu ◽  
Marek Tobiszewski ◽  
Vasil Simeonov

The present study describes a simple procedure to separate into patterns of similarity a large group of solvents, 259 in total, presented by 15 specific descriptors (experimentally found and theoretically predicted physicochemical parameters). Solvent data is usually characterized by its high variability, different molecular symmetry, and spatial orientation. Methods of chemometrics can usefully be used to extract and explore accurately the information contained in such data. In this order, advanced fuzzy divisive hierarchical-clustering methods were efficiently applied in the present study of a large group of solvents using specific descriptors. The fuzzy divisive hierarchical associative-clustering algorithm provides not only a fuzzy partition of the solvents investigated, but also a fuzzy partition of descriptors considered. In this way, it is possible to identify the most specific descriptors (in terms of higher, smallest, or intermediate values) to each fuzzy partition (group) of solvents. Additionally, the partitioning performed could be interpreted with respect to the molecular symmetry. The chemometric approach used for this goal is fuzzy c-means method being a semi-supervised clustering procedure. The advantage of such a clustering process is the opportunity to achieve separation of the solvents into similarity patterns with a certain degree of membership of each solvent to a certain pattern, as well as to consider possible membership of the same object (solvent) in another cluster. Partitioning based on a hybrid approach of the theoretical molecular descriptors and experimentally obtained ones permits a more straightforward separation into groups of similarity and acceptable interpretation. It was shown that an important link between objects’ groups of similarity and similarity groups of variables is achieved. Ten classes of solvents are interpreted depending on their specific descriptors, as one of the classes includes a single object and could be interpreted as an outlier. Setting the results of this research into broader perspective, it has been shown that the fuzzy clustering approach provides a useful tool for partitioning by the variables related to the main physicochemical properties of the solvents. It gets possible to offer a simple guide for solvents recognition based on theoretically calculated or experimentally found descriptors related to the physicochemical properties of the solvents.


Author(s):  
Ye. V. Bodyanskiy ◽  
A. Yu. Shafronenko ◽  
I. N. Klymova

Context. The problems of big data clustering today is a very relevant area of artificial intelligence. This task is often found in many applications related to data mining, deep learning, etc. To solve these problems, traditional approaches and methods require that the entire data sample be submitted in batch form. Objective. The aim of the work is to propose a method of fuzzy probabilistic data clustering using evolutionary optimization of cat swarm, that would be devoid of the drawbacks of traditional data clustering approaches. Method. The procedure of fuzzy probabilistic data clustering using evolutionary algorithms, for faster determination of sample extrema, cluster centroids and adaptive functions, allowing not to spend machine resources for storing intermediate calculations and do not require additional time to solve the problem of data clustering, regardless of the dimension and the method of presentation for processing. Results. The proposed data clustering algorithm based on evolutionary optimization is simple in numerical implementation, is devoid of the drawbacks inherent in traditional fuzzy clustering methods and can work with a large size of input information processed online in real time. Conclusions. The results of the experiment allow to recommend the developed method for solving the problems of automatic clustering and classification of big data, as quickly as possible to find the extrema of the sample, regardless of the method of submitting the data for processing. The proposed method of online probabilistic fuzzy data clustering based on evolutionary optimization of cat swarm is intended for use in hybrid computational intelligence systems, neuro-fuzzy systems, in training artificial neural networks, in clustering and classification problems.


2017 ◽  
Author(s):  
Debajyoti Sinha ◽  
Akhilesh Kumar ◽  
Himanshu Kumar ◽  
Sanghamitra Bandyopadhyay ◽  
Debarka Sengupta

ABSTRACTDroplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbor search technique to develop ade novoclustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.


Genes ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 98 ◽  
Author(s):  
Xiaoshu Zhu ◽  
Hong-Dong Li ◽  
Yunpei Xu ◽  
Lilu Guo ◽  
Fang-Xiang Wu ◽  
...  

Single-cell RNA sequencing (scRNA-seq) has recently brought new insight into cell differentiation processes and functional variation in cell subtypes from homogeneous cell populations. A lack of prior knowledge makes unsupervised machine learning methods, such as clustering, suitable for analyzing scRNA-seq . However, there are several limitations to overcome, including high dimensionality, clustering result instability, and parameter adjustment complexity. In this study, we propose a method by combining structure entropy and k nearest neighbor to identify cell subpopulations in scRNA-seq data. In contrast to existing clustering methods for identifying cell subtypes, minimized structure entropy results in natural communities without specifying the number of clusters. To investigate the performance of our model, we applied it to eight scRNA-seq datasets and compared our method with three existing methods (nonnegative matrix factorization, single-cell interpretation via multikernel learning, and structural entropy minimization principle). The experimental results showed that our approach achieves, on average, better performance in these datasets compared to the benchmark methods.


Author(s):  
J. Avanija ◽  
K. Ramar

With the massive growth and large volume of the web it is very difficult to recover results based on the user preferences. The next generation web architecture, semantic web reduces the burden of the user by performing search based on semantics instead of keywords. Even in the context of semantic technologies optimization problem occurs but rarely considered. In this paper, document clustering is applied to recover relevant documents. We propose an ontology-based clustering algorithm using semantic similarity measure and Fuzzy C-Means, which is applied to the annotated documents for optimizing the result. The proposed method uses Jena API and GATE tool API and the documents can be recovered based on their annotation features and relations. A preliminary experiment comparing the proposed method with K-Means, PSO and hybrid approach PSOK Means shows that the proposed method is feasible and performs better than other clustering methods.


Author(s):  
Charles Bouveyron ◽  
Gilles Celeux ◽  
T. Brendan Murphy ◽  
Adrian E. Raftery

2020 ◽  
pp. 1-10
Author(s):  
Colin J. McMahon ◽  
Justin T. Tretter ◽  
Theresa Faulkner ◽  
R. Krishna Kumar ◽  
Andrew N. Redington ◽  
...  

Abstract Objective: This study investigated the impact of the Webinar on deep human learning of CHD. Materials and methods: This cross-sectional survey design study used an open and closed-ended questionnaire to assess the impact of the Webinar on deep learning of topical areas within the management of the post-operative tetralogy of Fallot patients. This was a quantitative research methodology using descriptive statistical analyses with a sequential explanatory design. Results: One thousand-three-hundred and seventy-four participants from 100 countries on 6 continents joined the Webinar, 557 (40%) of whom completed the questionnaire. Over 70% of participants reported that they “agreed” or “strongly agreed” that the Webinar format promoted deep learning for each of the topics compared to other standard learning methods (textbook and journal learning). Two-thirds expressed a preference for attending a Webinar rather than an international conference. Over 80% of participants highlighted significant barriers to attending conferences including cost (79%), distance to travel (49%), time commitment (51%), and family commitments (35%). Strengths of the Webinar included expertise, concise high-quality presentations often discussing contentious issues, and the platform quality. The main weakness was a limited time for questions. Just over 53% expressed a concern for the carbon footprint involved in attending conferences and preferred to attend a Webinar. Conclusion: E-learning Webinars represent a disruptive innovation, which promotes deep learning, greater multidisciplinary participation, and greater attendee satisfaction with fewer barriers to participation. Although Webinars will never fully replace conferences, a hybrid approach may reduce the need for conferencing, reduce carbon footprint. and promote a “sustainable academia”.


Author(s):  
Kevin Y. Huang ◽  
Enrico Petretto

Single-cell transcriptomics analyses of the fibrotic lung uncovered two cell states critical to lung injury recovery in the alveolar epithelium- a reparative transitional cell state in the mouse and a disease-specific cell state (KRT5-/KRT17+) in human idiopathic pulmonary fibrosis (IPF). The murine transitional cell state lies between the differentiation from type 2 (AT2) to type 1 pneumocyte (AT1), and the human KRT5-/KRT17+ cell state may arise from the dysregulation of this differentiation process. We review major findings of single-cell transcriptomics analyses of the fibrotic lung and re-analyzed data from 7 single-cell RNA sequencing studies of human and murine models of IPF, focusing on the alveolar epithelium. Our comparative and cross-species single-cell transcriptomics analyses allowed us to further delineate the differentiation trajectories from AT2 to AT1 and AT2 to the KRT5-/KRT17+ cell state. We observed AT1 cells in human IPF retain the transcriptional signature of the murine transitional cell state. Using pseudotime analysis, we recapitulated the differentiation trajectories from AT2 to AT1 and from AT2 to KRT5-/KRT17+ cell state in multiple human IPF studies. We further delineated transcriptional programs underlying cell state transitions and determined the molecular phenotypes at terminal differentiation. We hypothesize that in addition to the reactivation of developmental programs (SOX4, SOX9), senescence (TP63, SOX4) and the Notch pathway (HES1) are predicted to steer intermediate progenitors to the KRT5-/KRT17+ cell state. Our analyses suggest that activation of SMAD3 later in the differentiation process may explain the fibrotic molecular phenotype typical of KRT5-/KRT17+ cells.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


Sign in / Sign up

Export Citation Format

Share Document