scholarly journals Robust partial reference-free cell composition estimation from tissue expression

2020 ◽  
Vol 36 (11) ◽  
pp. 3431-3438
Author(s):  
Ziyi Li ◽  
Zhenxing Guo ◽  
Ying Cheng ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. Results We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. Availability and implementation The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Alma Andersson ◽  
Joakim Lundeberg

Abstract Motivation Collection of spatial signals in large numbers has become a routine task in multiple omics-fields, but parsing of these rich datasets still pose certain challenges. In whole or near-full transcriptome spatial techniques, spurious expression profiles are intermixed with those exhibiting an organized structure. To distinguish profiles with spatial patterns from the background noise, a metric that enables quantification of spatial structure is desirable. Current methods designed for similar purposes tend to be built around a framework of statistical hypothesis testing, hence we were compelled to explore a fundamentally different strategy. Results We propose an unexplored approach to analyze spatial transcriptomics data, simulating diffusion of individual transcripts to extract genes with spatial patterns. The method performed as expected when presented with synthetic data. When applied to real data, it identified genes with distinct spatial profiles, involved in key biological processes or characteristic for certain cell types. Compared to existing methods, ours seemed to be less informed by the genes’ expression levels and showed better time performance when run with multiple cores. Availabilityand implementation Open-source Python package with a command line interface (CLI), freely available at https://github.com/almaan/sepal under an MIT licence. A mirror of the GitHub repository can be found at Zenodo, doi: 10.5281/zenodo.4573237. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Ingrid M. Lönnstedt ◽  
Sven Nelander

AbstractThe systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting


Genes ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 448
Author(s):  
Aayan N. Patel ◽  
Dennis Mathew

Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease that causes compromised function of motor neurons and neuronal death. However, oculomotor neurons are largely spared from disease symptoms. The underlying causes for sporadic ALS as well as for the resistance of oculomotor neurons to disease symptoms remain poorly understood. In this bioinformatic-analysis, we compared the gene expression profiles of spinal and oculomotor tissue samples from control individuals and sporadic ALS patients. We show that the genes GAD2 and GABRE (involved in GABA signaling), and CALB1 (involved in intracellular Ca2+ ion buffering) are downregulated in the spinal tissues of ALS patients, but their endogenous levels are higher in oculomotor tissues relative to the spinal tissues. Our results suggest that the downregulation of these genes and processes in spinal tissues are related to sporadic ALS disease progression and their upregulation in oculomotor neurons confer upon them resistance to ALS symptoms. These results build upon prevailing models of excitotoxicity that are relevant to sporadic ALS disease progression and point out unique opportunities for better understanding the progression of neurodegenerative properties associated with sporadic ALS.


Author(s):  
Bong-Hyun Kim ◽  
Kijin Yu ◽  
Peter C W Lee

Abstract Motivation Cancer classification based on gene expression profiles has provided insight on the causes of cancer and cancer treatment. Recently, machine learning-based approaches have been attempted in downstream cancer analysis to address the large differences in gene expression values, as determined by single-cell RNA sequencing (scRNA-seq). Results We designed cancer classifiers that can identify 21 types of cancers and normal tissues based on bulk RNA-seq as well as scRNA-seq data. Training was performed with 7398 cancer samples and 640 normal samples from 21 tumors and normal tissues in TCGA based on the 300 most significant genes expressed in each cancer. Then, we compared neural network (NN), support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF) methods. The NN performed consistently better than other methods. We further applied our approach to scRNA-seq transformed by kNN smoothing and found that our model successfully classified cancer types and normal samples. Availability and implementation Cancer classification by neural network. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Daiwei Tang ◽  
Seyoung Park ◽  
Hongyu Zhao

Abstract Motivation A number of computational methods have been proposed recently to profile tumor microenvironment (TME) from bulk RNA data, and they have proved useful for understanding microenvironment differences among therapeutic response groups. However, these methods are not able to account for tumor proportion nor variable mRNA levels across cell types. Results In this article, we propose a Nonnegative Matrix Factorization-based Immune-TUmor MIcroenvironment Deconvolution (NITUMID) framework for TME profiling that addresses these limitations. It is designed to provide robust estimates of tumor and immune cells proportions simultaneously, while accommodating mRNA level differences across cell types. Through comprehensive simulations and real data analyses, we demonstrate that NITUMID not only can accurately estimate tumor fractions and cell types’ mRNA levels, which are currently unavailable in other methods; it also outperforms most existing deconvolution methods in regular cell type profiling accuracy. Moreover, we show that NITUMID can more effectively detect clinical and prognostic signals from gene expression profiles in tumor than other methods. Availability and implementation The algorithm is implemented in R. The source code can be downloaded at https://github.com/tdw1221/NITUMID. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Ana I. Hernández Cordero ◽  
Xuan Li ◽  
Chen Xi Yang ◽  
Stephen Milne ◽  
Yohan Bossé ◽  
...  

ABSTRACTBACKGROUNDCell entry of SARS-CoV-2, the novel coronavirus causing COVID-19, is facilitated by host cell angiotensin-converting enzyme 2 (ACE2) and transmembrane serine protease 2 (TMPRSS2). We aimed to identify and characterize genes that are co-expressed with ACE2 and TMPRSS2, and to further explore their biological functions and potential as druggable targets.METHODSUsing the gene expression profiles of 1,038 lung tissue samples, we performed a weighted gene correlation network analysis (WGCNA) to identify modules of co-expressed genes. We explored the biology of co-expressed genes using bioinformatics databases, and identified known drug-gene interactions.RESULTSACE2 was in a module of 681 co-expressed genes; 12 genes with moderate-high correlation with ACE2 (r>0.3, FDR<0.05) had known interactions with existing drug compounds. TMPRSS2 was in a module of 1,086 co-expressed genes; 15 of these genes were enriched in the gene ontology biologic process ‘Entry into host cell’, and 53 TMPRSS2-correlated genes had known interactions with drug compounds.CONCLUSIONDozens of genes are co-expressed with ACE2 and TMPRSS2, many of which have plausible links to COVID-19 pathophysiology. Many of the co-expressed genes are potentially targetable with existing drugs, which may help to fast-track the development of COVID-19 therapeutics.


2020 ◽  
Vol 38 (5_suppl) ◽  
pp. 39-39
Author(s):  
Reinhard Dummer ◽  
Daniel Gusenleitner ◽  
Catarina D. Campbell ◽  
Celeste Lebbe ◽  
Victoria Atkinson ◽  
...  

39 Background: Although pts with both low tumor mutation burden (TMB) and T-cell–inflamed gene expression profiles (TI-GEPs) usually have poor outcomes with anti–PD-1 therapy, an analysis in the adjuvant melanoma setting suggested that these pts benefited from adjuvant D+T therapy. Here we analyze TMB/TI-GEPs and other biomarkers in pts receiving a combination of anti–PD-1 and D+T therapy. Methods: The phase 3 COMBI-i study (NCT02967692) is evaluating S in combination with D+T in previously untreated pts with BRAF V600–mutant unresectable/metastatic melanoma. In the safety run-in (part [p] 1) and biomarker (p2) cohorts, blood/tissue samples were collected at baseline (BL), after 2-3 and 8-12 wk of treatment, and at disease progression. TMB/circulating tumor DNA (ctDNA) and TI-GEPs were examined by targeted DNA-seq and RNA-seq, respectively. Results: At data cutoff, 6 of 22 pts with DNA- and RNA-seq data available had a PFS event. At BL, these pts had low TMB, low TI-GEPs (4 of 6), or high levels of immunosuppressive TME signatures (eg, fibroblast, M2 macrophages) vs pts without a PFS event. Elevated BL ctDNA was significantly associated with PFS events ( P< .001). Pts with a complete response (CR) on S+D+T had significantly lower levels of BL immunosuppressive TME signatures (eg, M2 macrophages; P< .01) than pts without a CR. We observed a consistent increase in TI-GEPs and decrease in MAPK pathway activity score (MPAS) from BL to biopsy at 2-3 wk in all pts regardless of subsequent progression. Pts with a PFS event and available longitudinal biomarker data were characterized by a subsequent decrease in TI-GEPs and an increase in MPAS per the 8- to 12- wk biopsy sample. Conclusions: These results suggest that S+D+T had an early impact on tumor cells and the TME, potentially promoting antitumor activity. The majority of PFS events occurred in the TMB-low/TI-GEP-low subgroup. An immunosuppressive TME might preclude early CRs. The predictive implications of coupling TMB/GEP subgroups with other TME marker subgroups need further validation. The randomized placebo-controlled p3 of COMBI-i is ongoing. Clinical trial information: NCT02967692.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i573-i582
Author(s):  
Ayse B Dincer ◽  
Joseph D Janizek ◽  
Su-In Lee

Abstract Motivation Increasing number of gene expression profiles has enabled the use of complex models, such as deep unsupervised neural networks, to extract a latent space from these profiles. However, expression profiles, especially when collected in large numbers, inherently contain variations introduced by technical artifacts (e.g. batch effects) and uninteresting biological variables (e.g. age) in addition to the true signals of interest. These sources of variations, called confounders, produce embeddings that fail to transfer to different domains, i.e. an embedding learned from one dataset with a specific confounder distribution does not generalize to different distributions. To remedy this problem, we attempt to disentangle confounders from true signals to generate biologically informative embeddings. Results In this article, we introduce the Adversarial Deconfounding AutoEncoder (AD-AE) approach to deconfounding gene expression latent spaces. The AD-AE model consists of two neural networks: (i) an autoencoder to generate an embedding that can reconstruct original measurements, and (ii) an adversary trained to predict the confounder from that embedding. We jointly train the networks to generate embeddings that can encode as much information as possible without encoding any confounding signal. By applying AD-AE to two distinct gene expression datasets, we show that our model can (i) generate embeddings that do not encode confounder information, (ii) conserve the biological signals present in the original space and (iii) generalize successfully across different confounder domains. We demonstrate that AD-AE outperforms standard autoencoder and other deconfounding approaches. Availability and implementation Our code and data are available at https://gitlab.cs.washington.edu/abdincer/ad-ae. Contact Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 12 (S8) ◽  
Author(s):  
Yen-Jung Chiu ◽  
Yi-Hsuan Hsieh ◽  
Yen-Hua Huang

Abstract Background To facilitate the investigation of the pathogenic roles played by various immune cells in complex tissues such as tumors, a few computational methods for deconvoluting bulk gene expression profiles to predict cell composition have been created. However, available methods were usually developed along with a set of reference gene expression profiles consisting of imbalanced replicates across different cell types. Therefore, the objective of this study was to create a new deconvolution method equipped with a new set of reference gene expression profiles that incorporate more microarray replicates of the immune cells that have been frequently implicated in the poor prognosis of cancers, such as T helper cells, regulatory T cells and macrophage M1/M2 cells. Methods Our deconvolution method was developed by choosing ε-support vector regression (ε-SVR) as the core algorithm assigned with a loss function subject to the L1-norm penalty. To construct the reference gene expression signature matrix for regression, a subset of differentially expressed genes were chosen from 148 microarray-based gene expression profiles for 9 types of immune cells by using ANOVA and minimizing condition number. Agreement analyses including mean absolute percentage errors and Bland-Altman plots were carried out to compare the performances of our method and CIBERSORT. Results In silico cell mixtures, simulated bulk tissues, and real human samples with known immune-cell fractions were used as the test datasets for benchmarking. Our method outperformed CIBERSORT in the benchmarks using in silico breast tissue-immune cell mixtures in the proportions of 30:70 and 50:50, and in the benchmark using 164 human PBMC samples. Our results suggest that the performance of our method was at least comparable to that of a state-of-the-art tool, CIBERSORT. Conclusions We developed a new cell composition deconvolution method and the implementation was entirely based on the publicly available R and Python packages. In addition, we compiled a new set of reference gene expression profiles, which might allow for a more robust prediction of the immune cell fractions from the expression profiles of cell mixtures. The source code of our method could be downloaded from https://github.com/holiday01/deconvolution-to-estimate-immune-cell-subsets.


2005 ◽  
Vol 86 (2) ◽  
pp. 127-138 ◽  
Author(s):  
SERGEY V. ANISIMOV

Mammalian mitochondrial genomes are organized in a conserved and extremely compact manner, encoding molecules that play a vital role in oxidative phosphorylation (OXPHOS) and carry out a number of other important biological functions. A large-scale screening of the normalized mitochondrial gene expression profiles generated from publicly available mammalian serial analysis of gene expression (SAGE) datasets (over 17·7 millions of tags) was performed in this study. Acquired SAGE libraries represent an extensive range of human, mouse, rat, bovine and swine cell and tissue samples (normal and pathological) in a variety of conditions. Using a straightforward in silico algorithm, variations in total mitochondrial gene expression, as well as in the expression of individual genes encoded by mitochondrial genomes are addressed, and common patterns in the species- and tissue-specific mitochondrial gene expression profiles are discussed.


Sign in / Sign up

Export Citation Format

Share Document