scholarly journals scDeepHash: An automatic cell type annotation and cell retrieval method for large-scale scRNA-seq datasets using neural network-based hashing

2021 ◽  
Author(s):  
Shihao Ma ◽  
Yanyi Zhang ◽  
Bohao Wang ◽  
Zian Hu ◽  
Jingwei Zhang ◽  
...  

Single-cell RNA-sequencing technologies measure transcriptomic expressions, which quantifies cell-to-cell heterogeneity at an unprecedented resolution. As these technologies become more readily available, the number of scRNA-seq datasets increases drastically. Prior works have demonstrated that bias-free, holistic single-cell profiling infrastructures are essential to the emerging automatic cell-type annotation methods. We propose scDeepHash, a scalable scRNA-seq analytic tool that employs content-based deep hashing to index single-cell gene expressions. scDeepHash allows for fast and accurate automated cell-type annotation and similar-cell retrieval. We also demonstrated the performance of scDeepHash by benchmarking it against current state of the art methods across multiple public scRNA-seq datasets.

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Ruoxin Li ◽  
Gerald Quon

Abstract Technical variation in feature measurements, such as gene expression and locus accessibility, is a key challenge of large-scale single-cell genomic datasets. We show that this technical variation in both scRNA-seq and scATAC-seq datasets can be mitigated by analyzing feature detection patterns alone and ignoring feature quantification measurements. This result holds when datasets have low detection noise relative to quantification noise. We demonstrate state-of-the-art performance of detection pattern models using our new framework, scBFA, for both cell type identification and trajectory inference. Performance gains can also be realized in one line of R code in existing pipelines.


2021 ◽  
Author(s):  
Xiangchun Li ◽  
Xilin Shen

Integration of the evolving large-scale single-cell transcriptomes requires scalable batch-correction approaches. Here we propose a simple batch-correction method that is scalable for integrating super large-scale single-cell transcriptomes from diverse sources. The core idea of the method is encoding batch information of each cell as a trainable parameter and added to its expression profile; subsequently, a contrastive learning approach is used to learn feature representation of the additive expression profile. We demonstrate the scalability of the proposed method by integrating 18 million cells obtained from the Human Cell Atlas. Our benchmark comparisons with current state-of-the-art single-cell integration methods demonstrated that our method could achieve comparable data alignment and cluster preservation. Our study would facilitate the integration of super large-scale single-cell transcriptomes. The source code is available at https://github.com/xilinshen/Fugue.


2018 ◽  
Author(s):  
Ruoxin Li ◽  
Gerald Quon

AbstractTechnical variation in feature measurements such as gene expression and locus accessibility is a key challenge of large-scale single cell genomic datasets. We show that this technical variation in both scRNA-seq and scATAC-seq datasets can be mitigated by performing analysis on feature detection patterns alone and ignoring feature quantification measurements. This result holds when datasets have low detection noise relative to quantification noise. We demonstrate state-of-the-art performance of detection pattern models using our new framework, scBFA, for both cell type identification and trajectory inference. Performance gains can also be realized in one line of R code in existing pipelines.


2021 ◽  
Vol 43 ◽  
pp. e58283
Author(s):  
Clístenes Williams Araújo do Nascimento ◽  
Caroline Miranda Biondi ◽  
Fernando Bruno Vieira da Silva ◽  
Luiz Henrique Vieira Lima

Soil contamination by metals threatens both the environment and human health and hence requires remedial actions. The conventional approach of removing polluted soils and replacing them with clean soils (excavation) is very costly for low-value sites and not feasible on a large scale. In this scenario, phytoremediation emerged as a promising cost-effective and environmentally-friendly technology to render metals less bioavailable (phytostabilization) or clean up metal-polluted soils (phytoextraction). Phytostabilization has demonstrable successes in mining sites and brownfields. On the other hand, phytoextraction still has few examples of successful applications. Either by using hyperaccumulating plants or high biomass plants induced to accumulate metals through chelator addition to the soil, major phytoextraction bottlenecks remain, mainly the extended time frame to remediation and lack of revenue from the land during the process. Due to these drawbacks, phytomanagement has been proposed to provide economic, environmental, and social benefits until the contaminated site returns to productive usage. Here, we review the evolution, promises, and limitations of these phytotechnologies. Despite the lack of commercial phytoextraction operations, there have been significant advances in understanding phytotechnologies' main constraints. Further investigation on new plant species, especially in the tropics, and soil amendments can potentially provide the basis to transform phytoextraction into an operational metal clean-up technology in the future. However, at the current state of the art, phytotechnology is moving the focus from remediation technologies to pollution attenuation and palliative cares.


2020 ◽  
Vol 17 (6) ◽  
pp. 621-628 ◽  
Author(s):  
Zhichao Miao ◽  
Pablo Moreno ◽  
Ni Huang ◽  
Irene Papatheodorou ◽  
Alvis Brazma ◽  
...  

Author(s):  
Krzysztof Karsznia ◽  
Konrad Podawca

Monitoring of structures and other different field objects undoubtedly belongs to the main issues of modern engineering. The use of technologies making it possible to implement structural monitoring makes it possible to build an integrated risk management approach combining instrumental solutions with geoinformation systems. In the studies of engineering structures, there is physical monitoring mainly used for examining the physical state of the object - so-called SHM ("Structural Health Monitoring"). However, very important role is also played by geodetic monitoring systems (GMS). The progress observed in the field of IT and automatics has opened new possibilities of using integrated systems on other, often large-scale objects. Based on the current state-of-the-art, the article presents the concept of integration approaches of physical and geodetic monitoring systems in order to develop useful guidelines for further construction of an expert risk management system.


Author(s):  
William Prescott

This paper will investigate the use of large scale multibody dynamics (MBD) models for real-time vehicle simulation. Current state of the art in the real-time solution of vehicle uses 15 degree of freedom models, but there is a need for higher-fidelity systems. To increase the fidelity of models uses this paper will propose the use of the following techniques: implicit integration, parallel processing and co-simulation in a real-time environment.


2020 ◽  
Author(s):  
Van Hoan Do ◽  
Francisca Rojas Ringeling ◽  
Stefan Canzar

AbstractA fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultra-large scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter’s speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and that is sensitive to rare cell types. Its linear time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression we demonstrate that Specter is able to utilize multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. Specter is open source and available at https://github.com/canzarlab/Specter.


2019 ◽  
Author(s):  
Yuchen Yang ◽  
Gang Li ◽  
Huijun Qian ◽  
Kirk C. Wilhelmsen ◽  
Yin Shen ◽  
...  

AbstractBatch effect correction has been recognized to be indispensable when integrating single-cell RNA sequencing (scRNA-seq) data from multiple batches. State-of-the-art methods ignore single-cell cluster label information, but such information can improve effectiveness of batch effect correction, particularly under realistic scenarios where biological differences are not orthogonal to batch effects. To address this issue, we propose SMNN for batch effect correction of scRNA-seq data via supervised mutual nearest neighbor detection. Our extensive evaluations in simulated and real datasets show that SMNN provides improved merging within the corresponding cell types across batches, leading to reduced differentiation across batches over MNN, Seurat v3, and LIGER. Furthermore, SMNN retains more cell type-specific features, partially manifested by differentially expressed genes identified between cell types after SMNN correction being biologically more relevant, with precision improving by up to 841%.Key PointsBatch effect correction has been recognized to be critical when integrating scRNA-seq data from multiple batches due to systematic differences in time points, generating laboratory and/or handling technician(s), experimental protocol, and/or sequencing platform.Existing batch effect correction methods that leverages information from mutual nearest neighbors across batches (for example, implemented in SC3 or Seurat) ignore cell type information and suffer from potentially mismatching single cells from different cell types across batches, which would lead to undesired correction results, especially under the scenario where variation from batch effects is non-negligible compared with biological effects.To address this critical issue, here we present SMNN, a supervised machine learning method that first takes cluster/cell-type label information from users or inferred from scRNA-seq clustering, and then searches mutual nearest neighbors within each cell type instead of global searching.Our SMNN method shows clear advantages over three state-of-the-art batch effect correction methods and can better mix cells of the same cell type across batches and more effectively recover cell-type specific features, in both simulations and real datasets.


2018 ◽  
Author(s):  
Nikos Konstantinides ◽  
Katarina Kapuralin ◽  
Chaimaa Fadil ◽  
Luendreo Barboza ◽  
Rahul Satija ◽  
...  

SummaryTranscription factors regulate the molecular, morphological, and physiological characters of neurons and generate their impressive cell type diversity. To gain insight into general principles that govern how transcription factors regulate cell type diversity, we used large-scale single-cell mRNA sequencing to characterize the extensive cellular diversity in the Drosophila optic lobes. We sequenced 55,000 single optic lobe neurons and glia and assigned them to 52 clusters of transcriptionally distinct single cells. We validated the clustering and annotated many of the clusters using RNA sequencing of characterized FACS-sorted single cell types, as well as marker genes specific to given clusters. To identify transcription factors responsible for inducing specific terminal differentiation features, we used machine-learning to generate a ‘random forest’ model. The predictive power of the model was confirmed by showing that two transcription factors expressed specifically in cholinergic (apterous) and glutamatergic (traffic-jam) neurons are necessary for the expression of ChAT and VGlut in many, but not all, cholinergic or glutamatergic neurons, respectively. We used a transcriptome-wide approach to show that the same terminal characters, including but not restricted to neurotransmitter identity, can be regulated by different transcription factors in different cell types, arguing for extensive phenotypic convergence. Our data provide a deep understanding of the developmental and functional specification of a complex brain structure.


Sign in / Sign up

Export Citation Format

Share Document