scholarly journals Isolating salient variations of interest in single-cell transcriptomic data with contrastiveVI

2021 ◽  
Author(s):  
Ethan Weinberger ◽  
Chris Lin ◽  
Su-In Lee

Single-cell RNA sequencing (scRNA-seq) technologies enable a better understanding of previously unexplored biological diversity. Oftentimes, researchers are specifically interested in modeling the latent structures and variations enriched in one target scRNA-seq dataset as compared to another background dataset generated from sources of variation irrelevant to the task at hand. For example, we may wish to isolate factors of variation only present in measurements from patients with a given disease as opposed to those shared with data from healthy control subjects. Here we introduce Contrastive Variational Inference (contrastiveVI; https://github.com/suinleelab/contrastiveVI), a framework for end-to-end analysis of target scRNA-seq datasets that decomposes the variations into shared and target-specific factors of variation. On three target-background dataset pairs we demonstrate that contrastiveVI learns latent representations that recover known subgroups of target data points better than previous methods and finds differentially expressed genes that agree with known ground truths.

1995 ◽  
Vol 31 (12) ◽  
pp. 267-273 ◽  
Author(s):  
B. S. O. Ceballos ◽  
A. Konig ◽  
B. Lomans ◽  
A. B. Athayde ◽  
H. W. Pearson

A single full-scale primary facultative pond in Sapé, north-east Brazil was monitored for performance and efficiency. The pond had a hydraulic retention time of 61 days and achieved a 95% BOD5 removal efficiency and had no helminth eggs in the effluent. The effluent failed to meet the WHO faecal coliform guideline for unrestricted irrigation. The pond was dominated by the cyanobacterium Microcystis and gave better than predicted orthophosphate removal. Details of how the system could be simply upgraded utilizing the same land are discussed.


Pythagoras ◽  
2017 ◽  
Vol 38 (1) ◽  
Author(s):  
Michael Murray

Over half of all students enrolling at a particular university in KwaZulu-Natal fail to complete a degree. This article is wanting to determine to what extent the marks they obtain for English and Mathematics at school impact on their probability of graduation at this university. In addressing this problem, other student specific factors associated with their gender, race and the type of school they have attended need also to be properly accounted for. To provide answers for this study, the performance of 24 392 students enrolling at the university over the period 2004 to 2012 was followed until they graduated or dropped out from their studies. A structural equation model was fitted because it allows one to separate a direct effect from that of an indirect effect. Gender, race and school background were found to be very significant with males, Black Africans and students coming from a less privileged school background having a smaller probability associated with eventually graduating from this university. Males tend to perform better than females in Mathematics, with females performing better males in English. More importantly, however, a single percentage point increase in one’s mark for English increases the probability associated with graduating from this university far more than would be the case if their Mathematics mark were to increase by a single percentage point. In the light of these mediated results, perhaps this university should be directing their efforts more towards improving the English (rather than mathematical) literacy of students entering the university.


2017 ◽  
Vol 33 (3) ◽  
pp. 233-236 ◽  
Author(s):  
Kevin D. Dames ◽  
Jeremy D. Smith ◽  
Gary D. Heise

Gait data are commonly presented as an average of many trials or as an average across participants. Discrete data points (eg, maxima or minima) are identified and used as dependent variables in subsequent statistical analyses. However, the approach used for obtaining average data from multiple trials is inconsistent and unclear in the biomechanics literature. This study compared the statistical outcomes of averaging peaks from multiple trials versus identifying a single peak from an average profile. A series of paired-samples t tests were used to determine whether there were differences in average dependent variables from these 2 methods. Identifying a peak value from the average profile resulted in significantly smaller magnitudes of dependent variables than when peaks from multiple trials were averaged. Disagreement between the 2 methods was due to temporal differences in trial peak locations. Sine curves generated in MATLAB confirmed this misrepresentation of trial peaks in the average profile when a phase shift was introduced. Based on these results, averaging individual trial peaks represents the actual data better than choosing a peak from an average trial profile.


Cells ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 3126
Author(s):  
Dominik Saul ◽  
Robyn Laura Kosinsky

The human aging process is associated with molecular changes and cellular degeneration, resulting in a significant increase in cancer incidence with age. Despite their potential correlation, the relationship between cancer- and ageing-related transcriptional changes is largely unknown. In this study, we aimed to analyze aging-associated transcriptional patterns in publicly available bulk mRNA-seq and single-cell RNA-seq (scRNA-seq) datasets for chronic myelogenous leukemia (CML), colorectal cancer (CRC), hepatocellular carcinoma (HCC), lung cancer (LC), and pancreatic ductal adenocarcinoma (PDAC). Indeed, we detected that various aging/senescence-induced genes (ASIGs) were upregulated in malignant diseases compared to healthy control samples. To elucidate the importance of ASIGs during cell development, pseudotime analyses were performed, which revealed a late enrichment of distinct cancer-specific ASIG signatures. Notably, we were able to demonstrate that all cancer entities analyzed in this study comprised cell populations expressing ASIGs. While only minor correlations were detected between ASIGs and transcriptome-wide changes in PDAC, a high proportion of ASIGs was induced in CML, CRC, HCC, and LC samples. These unique cellular subpopulations could serve as a basis for future studies on the role of aging and senescence in human malignancies.


2021 ◽  
Author(s):  
Rory Donovan-Maiye ◽  
Jackson Brown ◽  
Caleb Chan ◽  
Liya Ding ◽  
Calysta Yan ◽  
...  

We introduce a framework for end-to-end integrative modeling of 3D single-cell multi-channel fluorescent image data of diverse subcellular structures. We employ stacked conditional β-variational autoencoders to first learn a latent representation of cell morphology, and then learn a latent representation of subcellular structure localization which is conditioned on the learned cell morphology. Our model is flexible and can be trained on images of arbitrary subcellular structures and at varying degrees of sparsity and reconstruction fidelity. We train our full model on 3D cell image data and explore design trade-offs in the 2D setting. Once trained, our model can be used to impute structures in cells where they were not imaged and to quantify the variation in the location of all subcellular structures by generating plausible instantiations of each structure in arbitrary cell geometries. We apply our trained model to a small drug perturbation screen to demonstrate its applicability to new data. We show how the latent representations of drugged cells differ from unperturbed cells as expected by on-target effects of the drugs.


2009 ◽  
Vol 133 (8) ◽  
pp. 1262-1267 ◽  
Author(s):  
Sarah E. Coupland ◽  
Valerie A. White ◽  
Jack Rootman ◽  
Bertil Damato ◽  
Paul T. Finger

Abstract Context.—The ocular adnexal lymphomas (OAL) arise in the conjunctiva, orbit, lacrimal gland, and eyelids. To date, they have been clinically staged using the Ann Arbor staging system, first designed for Hodgkin and later for nodal, non–Hodgkin lymphoma. The Ann Arbor system has several shortcomings, particularly when staging extranodal non– Hodgkin lymphomas, such as OAL, which show different dissemination patterns from nodal lymphomas. Objective.—To describe the first TNM-based clinical staging system for OAL. Design.—Retrospective literature review. Results.—We have developed, to our knowledge, the first American Joint Committee on Cancer–International Union Against Cancer TNM-based staging system for OAL to overcome the limitations of the Ann Arbor system. Our staging system defines disease extent more precisely within the various anatomic compartments of the ocular adnexa and allows for analysis of site-specific factors not addressed previously. It aims to facilitate future studies by identifying clinical and histomorphologic features of prognostic significance. This system is for primary OAL only and is not intended for intraocular lymphomas. Conclusions.—Our TNM-based staging system for OAL is a user-friendly, anatomic documentation of disease extent, which creates a common language for multicenter and international collaboration. Data points will be collected with the aim of identifying biomarkers to be incorporated into the staging system.


Author(s):  
Samuel Melton ◽  
Sharad Ramanathan

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 48 (11) ◽  
pp. e62-e62 ◽  
Author(s):  
Qi Song ◽  
Jiyoung Lee ◽  
Shamima Akter ◽  
Matthew Rogers ◽  
Ruth Grene ◽  
...  

Abstract Recent advances in genomic technologies have generated data on large-scale protein–DNA interactions and open chromatin regions for many eukaryotic species. How to identify condition-specific functions of transcription factors using these data has become a major challenge in genomic research. To solve this problem, we have developed a method called ConSReg, which provides a novel approach to integrate regulatory genomic data into predictive machine learning models of key regulatory genes. Using Arabidopsis as a model system, we tested our approach to identify regulatory genes in data sets from single cell gene expression and from abiotic stress treatments. Our results showed that ConSReg accurately predicted transcription factors that regulate differentially expressed genes with an average auROC of 0.84, which is 23.5–25% better than enrichment-based approaches. To further validate the performance of ConSReg, we analyzed an independent data set related to plant nitrogen responses. ConSReg provided better rankings of the correct transcription factors in 61.7% of cases, which is three times better than other plant tools. We applied ConSReg to Arabidopsis single cell RNA-seq data, successfully identifying candidate regulatory genes that control cell wall formation. Our methods provide a new approach to define candidate regulatory genes using integrated genomic data in plants.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
David DeTomaso ◽  
Matthew G. Jones ◽  
Meena Subramaniam ◽  
Tal Ashuach ◽  
Chun J. Ye ◽  
...  

Abstract We present Vision, a tool for annotating the sources of variation in single cell RNA-seq data in an automated and scalable manner. Vision operates directly on the manifold of cell-cell similarity and employs a flexible annotation approach that can operate either with or without preconceived stratification of the cells into groups or along a continuum. We demonstrate the utility of Vision in several case studies and show that it can derive important sources of cellular variation and link them to experimental meta-data even with relatively homogeneous sets of cells. Vision produces an interactive, low latency and feature rich web-based report that can be easily shared among researchers, thus facilitating data dissemination and collaboration.


2019 ◽  
Vol 78 (10) ◽  
pp. 1379-1387 ◽  
Author(s):  
Eleanor Valenzi ◽  
Melissa Bulik ◽  
Tracy Tabib ◽  
Christina Morse ◽  
John Sembrat ◽  
...  

ObjectivesMyofibroblasts are key effector cells in the extracellular matrix remodelling of systemic sclerosis-associated interstitial lung disease (SSc-ILD); however, the diversity of fibroblast populations present in the healthy and SSc-ILD lung is unknown and has prevented the specific study of the myofibroblast transcriptome. We sought to identify and define the transcriptomes of myofibroblasts and other mesenchymal cell populations in human healthy and SSc-ILD lungs to understand how alterations in fibroblast phenotypes lead to SSc-ILD fibrosis.MethodsWe performed droplet-based, single-cell RNA-sequencing with integrated canonical correlation analysis of 13 explanted lung tissue specimens (56 196 cells) from four healthy control and four patients with SSc-ILD, with findings confirmed by cellular indexing of transcriptomes and epitopes by sequencing in additional samples.ResultsExamination of gene expression in mesenchymal cells identified two major, SPINT2hi and MFAP5hi, and one minor, WIF1hi, fibroblast populations in the healthy control lung. Combined analysis of control and SSc-ILD mesenchymal cells identified SPINT2hi, MFAP5hi, few WIF1hi fibroblasts and a new large myofibroblast population with evidence of actively proliferating myofibroblasts. We compared differential gene expression between all SSc-ILD and control mesenchymal cell populations, as well as among the fibroblast subpopulations, showing that myofibroblasts undergo the greatest phenotypic changes in SSc-ILD and strongly upregulate expression of collagens and other profibrotic genes.ConclusionsOur results demonstrate previously unrecognised fibroblast heterogeneity in SSc-ILD and healthy lungs, and define multimodal transcriptome-phenotypes associated with these populations. Our data indicate that myofibroblast differentiation and proliferation are key pathological mechanisms driving fibrosis in SSc-ILD.


Sign in / Sign up

Export Citation Format

Share Document