scholarly journals Approximate distance correlation for selecting highly interrelated genes across datasets

2021 ◽  
Vol 17 (11) ◽  
pp. e1009548
Author(s):  
Qunlun Shen ◽  
Shihua Zhang

With the rapid accumulation of biological omics datasets, decoding the underlying relationships of cross-dataset genes becomes an important issue. Previous studies have attempted to identify differentially expressed genes across datasets. However, it is hard for them to detect interrelated ones. Moreover, existing correlation-based algorithms can only measure the relationship between genes within a single dataset or two multi-modal datasets from the same samples. It is still unclear how to quantify the strength of association of the same gene across two biological datasets with different samples. To this end, we propose Approximate Distance Correlation (ADC) to select interrelated genes with statistical significance across two different biological datasets. ADC first obtains the k most correlated genes for each target gene as its approximate observations, and then calculates the distance correlation (DC) for the target gene across two datasets. ADC repeats this process for all genes and then performs the Benjamini-Hochberg adjustment to control the false discovery rate. We demonstrate the effectiveness of ADC with simulation data and four real applications to select highly interrelated genes across two datasets. These four applications including 21 cancer RNA-seq datasets of different tissues; six single-cell RNA-seq (scRNA-seq) datasets of mouse hematopoietic cells across six different cell types along the hematopoietic cell lineage; five scRNA-seq datasets of pancreatic islet cells across five different technologies; coupled single-cell ATAC-seq (scATAC-seq) and scRNA-seq data of peripheral blood mononuclear cells (PBMC). Extensive results demonstrate that ADC is a powerful tool to uncover interrelated genes with strong biological implications and is scalable to large-scale datasets. Moreover, the number of such genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.

2019 ◽  
Author(s):  
Ralph Patrick ◽  
David T. Humphreys ◽  
Vaibhao Janbandhu ◽  
Alicia Oshlack ◽  
Joshua W.K. Ho ◽  
...  

AbstractHigh-throughput single-cell RNA-seq (scRNA-seq) is a powerful tool for studying gene expression in single cells. Most current scRNA-seq bioinformatics tools focus on analysing overall expression levels, largely ignoring alternative mRNA isoform expression. We present a computational pipeline, Sierra, that readily detects differential transcript usage from data generated by commonly used polyA-captured scRNA-seq technology. We validate Sierra by comparing cardiac scRNA-seq cell-types to bulk RNA-seq of matched populations, finding significant overlap in differential transcripts. Sierra detects differential transcript usage across human peripheral blood mononuclear cells and the Tabula Muris, and 3’UTR shortening in cardiac fibroblasts. Sierra is available at https://github.com/VCCRI/Sierra.


2019 ◽  
Author(s):  
Florian Wagner

AbstractClustering of cells by cell type is arguably the most common and repetitive task encountered during the analysis of single-cell RNA-Seq data. However, as popular clustering methods operate largely independently of visualization techniques, the fine-tuning of clustering parameters can be unintuitive and time-consuming. Here, I propose Galapagos, a simple and effective clustering workflow based on t-SNE and DBSCAN that does not require a gene selection step. In practice, Galapagos only involves the fine-tuning of two parameters, which is straightforward, as clustering is performed directly on the t-SNE visualization results. Using peripheral blood mononuclear cells as a model tissue, I validate the effectiveness of Galapagos in different ways. First, I show that Galapagos generates clusters corresponding to all main cell types present. Then, I demonstrate that the t-SNE results are robust to parameter choices and initialization points. Next, I employ a simulation approach to show that clustering with Galapagos is accurate and robust to the high levels of technical noise present. Finally, to demonstrate Galapagos’ accuracy on real data, I compare clustering results to true cell type identities established using CITE-Seq data. In this context, I also provide an example of the primary limitation of Galapagos, namely the difficulty to resolve related cell types in cases where t-SNE fails to clearly separate the cells. Galapagos helps to make clustering scRNA-Seq data more intuitive and reproducible, and can be implemented in most programming languages with only a few lines of code.


2021 ◽  
Author(s):  
Zhibin Li ◽  
chengcheng Sun ◽  
Fei Wang ◽  
Xiran Wang ◽  
Jiacheng Zhu ◽  
...  

Background: Immune cells play important roles in mediating immune response and host defense against invading pathogens. However, insights into the molecular mechanisms governing circulating immune cell diversity among multiple species are limited. Methods: In this study, we compared the single-cell transcriptomes of 77 957 immune cells from 12 species using single-cell RNA-sequencing (scRNA-seq). Distinct molecular profiles were characterized for different immune cell types, including T cells, B cells, natural killer cells, monocytes, and dendritic cells. Results: The results revealed the heterogeneity and compositions of circulating immune cells among 12 different species. Additionally, we explored the conserved and divergent cellular cross-talks and genetic regulatory networks among vertebrate immune cells. Notably, the ligand and receptor pair VIM-CD44 was highly conserved among the immune cells. Conclusions: This study is the first to provide a comprehensive analysis of the cross-species single-cell atlas for peripheral blood mononuclear cells (PBMCs). This research should advance our understanding of the cellular taxonomy and fundamental functions of PBMCs, with important implications in evolutionary biology, developmental biology, and immune system disorders


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Kaikun Xie ◽  
Yu Huang ◽  
Feng Zeng ◽  
Zehua Liu ◽  
Ting Chen

Abstract Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.


2018 ◽  
Author(s):  
Neo Christopher Chung

AbstractSingle cell RNA sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts, and environmental stimuli. Cell identities of samples derived from heterogeneous subpopulations are routinely determined by clustering of scRNA-seq data. Computational cell identities are then used in downstream analysis, feature selection, and visualization. However, how can we examine if cell identities are accurately inferred? To this end, we introduce non-parametric methods to evaluate cell identities by testing cluster memberships of single cell samples in an unsupervised manner. We propose posterior inclusion probabilities for cluster memberships to select and visualize samples relevant to subpopulations. Beyond simulation studies, we examined two scRNA-seq data - a mixture of Jurkat and 293T cells and a large family of peripheral blood mononuclear cells. We demonstrated probabilistic feature selection and improved t-SNE visualization. By learning uncertainty in clustering, the proposed methods enable rigorous testing of cell identities in scRNA-seq.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhe Cui ◽  
Ya Cui ◽  
Yan Gao ◽  
Tao Jiang ◽  
Tianyi Zang ◽  
...  

Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.


2020 ◽  
Author(s):  
Van Hoan Do ◽  
Francisca Rojas Ringeling ◽  
Stefan Canzar

AbstractA fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultra-large scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter’s speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and that is sensitive to rare cell types. Its linear time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression we demonstrate that Specter is able to utilize multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells. Specter is open source and available at https://github.com/canzarlab/Specter.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Jiping Liu ◽  
Junbang Wang ◽  
Jinfang Xu ◽  
Han Xia ◽  
Yue Wang ◽  
...  

AbstractLarge-scale COVID-19 vaccinations are currently underway in many countries in response to the COVID-19 pandemic. Here, we report, besides generation of neutralizing antibodies, consistent alterations in hemoglobin A1c, serum sodium and potassium levels, coagulation profiles, and renal functions in healthy volunteers after vaccination with an inactivated SARS-CoV-2 vaccine. Similar changes had also been reported in COVID-19 patients, suggesting that vaccination mimicked an infection. Single-cell mRNA sequencing (scRNA-seq) of peripheral blood mononuclear cells (PBMCs) before and 28 days after the first inoculation also revealed consistent alterations in gene expression of many different immune cell types. Reduction of CD8+ T cells and increase in classic monocyte contents were exemplary. Moreover, scRNA-seq revealed increased NF-κB signaling and reduced type I interferon responses, which were confirmed by biological assays and also had been reported to occur after SARS-CoV-2 infection with aggravating symptoms. Altogether, our study recommends additional caution when vaccinating people with pre-existing clinical conditions, including diabetes, electrolyte imbalances, renal dysfunction, and coagulation disorders.


2018 ◽  
Vol 115 (52) ◽  
pp. E12363-E12369 ◽  
Author(s):  
Fabio Zanini ◽  
Makeda L. Robinson ◽  
Derek Croote ◽  
Malaya Kumar Sahoo ◽  
Ana Maria Sanz ◽  
...  

Dengue virus (DENV) infection can result in severe complications. However, the understanding of the molecular correlates of severity is limited, partly due to difficulties in defining the peripheral blood mononuclear cells (PBMCs) that contain DENV RNA in vivo. Accordingly, there are currently no biomarkers predictive of progression to severe dengue (SD). Bulk transcriptomics data are difficult to interpret because blood consists of multiple cell types that may react differently to infection. Here, we applied virus-inclusive single-cell RNA-seq approach (viscRNA-Seq) to profile transcriptomes of thousands of single PBMCs derived early in the course of disease from six dengue patients and four healthy controls and to characterize distinct leukocyte subtypes that harbor viral RNA (vRNA). Multiple IFN response genes, particularly MX2 in naive B cells and CD163 in CD14+ CD16+ monocytes, were up-regulated in a cell-specific manner before progression to SD. The majority of vRNA-containing cells in the blood of two patients who progressed to SD were naive IgM B cells expressing the CD69 and CXCR4 receptors and various antiviral genes, followed by monocytes. Bystander, non-vRNA–containing B cells also demonstrated immune activation, and IgG1 plasmablasts from two patients exhibited clonal expansions. Lastly, assembly of the DENV genome sequence revealed diversity at unexpected sites. This study presents a multifaceted molecular elucidation of natural dengue infection in humans with implications for any tissue and viral infection and proposes candidate biomarkers for prediction of SD.


Sign in / Sign up

Export Citation Format

Share Document