scholarly journals Single-cell normalization and association testing unifying CRISPR screen and gene co-expression analyses with Normalisr

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Lingfei Wang

AbstractSingle-cell RNA sequencing (scRNA-seq) provides unprecedented technical and statistical potential to study gene regulation but is subject to technical variations and sparsity. Furthermore, statistical association testing remains difficult for scRNA-seq. Here we present Normalisr, a normalization and statistical association testing framework that unifies single-cell differential expression, co-expression, and CRISPR screen analyses with linear models. By systematically detecting and removing nonlinear confounders arising from library size at mean and variance levels, Normalisr achieves high sensitivity, specificity, speed, and generalizability across multiple scRNA-seq protocols and experimental conditions with unbiased p-value estimation. The superior scalability allows us to reconstruct robust gene regulatory networks from trans-effects of guide RNAs in large-scale single cell CRISPRi screens. On conventional scRNA-seq, Normalisr recovers gene-level co-expression networks that recapitulated known gene functions.

2021 ◽  
Author(s):  
Lingfei Wang

AbstractSingle-cell RNA sequencing (scRNA-seq) provides unprecedented technical and statistical potential to study gene regulation but is subject to technical variations and sparsity. Here we present Normalisr, a linear-model-based normalization and statistical hypothesis testing framework that unifies single-cell differential expression, co-expression, and CRISPR scRNA-seq screen analyses. By systematically detecting and removing nonlinear confounding from library size, Normalisr achieves high sensitivity, specificity, speed, and generalizability across multiple scRNA-seq protocols and experimental conditions with unbiased P-value estimation. We use Normalisr to reconstruct robust gene regulatory networks from trans-effects of gRNAs in large-scale CRISPRi scRNA-seq screens and gene-level co-expression networks from conventional scRNA-seq.


2019 ◽  
Author(s):  
Ning Wang ◽  
Andrew E. Teschendorff

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.


2019 ◽  
Vol 51 (10) ◽  
pp. 506-515 ◽  
Author(s):  
Lydia Coulter Kwee ◽  
Megan L. Neely ◽  
Elizabeth Grass ◽  
Simon G. Gregory ◽  
Matthew T. Roe ◽  
...  

The genomic regulatory networks underlying the pathogenesis of non-ST-segment elevation acute coronary syndrome (NSTE-ACS) are incompletely understood. As intermediate traits, protein biomarkers report on underlying disease severity and prognosis in NSTE-ACS. We hypothesized that integration of dense microRNA (miRNA) profiling with biomarker measurements would highlight potential regulatory pathways that underlie the relationships between prognostic biomarkers, miRNAs, and cardiovascular phenotypes. We performed miRNA sequencing using whole blood from 186 patients from the TRILOGY-ACS trial. Seven circulating prognostic biomarkers were measured: NH2-terminal pro-B-type natriuretic peptide (NT-proBNP), high-sensitivity C-reactive protein, osteopontin (OPN), myeloperoxidase, growth differentiation factor 15, monocyte chemoattractant protein, and neopterin. We tested miRNAs for association with each biomarker with generalized linear models and controlled the false discovery rate at 0.05. Ten miRNAs, including known cardiac-related miRNAs 25-3p and 423-3p, were associated with NT-proBNP levels (min. P = 7.5 × 10−4) and 48 miRNAs, including cardiac-related miRNAs 378a-3p, 20b-5p and 320a, -b, and -d, were associated with OPN levels (min. P = 1.6 × 10−6). NT-proBNP and OPN were also associated with time to cardiovascular death, myocardial infarction (MI), or stroke in the sample. By integrating large-scale miRNA profiling with circulating biomarkers as intermediate traits, we identified associations of known cardiac-related and novel miRNAs with two prognostic biomarkers and identified potential genomic networks regulating these biomarkers. These results, highlighting plausible biological pathways connecting miRNAs with biomarkers and outcomes, may inform future studies seeking to delineate genomic pathways underlying NSTE-ACS outcomes.


2020 ◽  
Author(s):  
Emma Dann ◽  
Neil C. Henderson ◽  
Sarah A. Teichmann ◽  
Michael D. Morgan ◽  
John C. Marioni

AbstractSingle-cell omic protocols applied to disease, development or mechanistic studies can reveal the emergence of aberrant cell states or changes in differentiation. These perturbations can manifest as a shift in the abundance of cells associated with a biological condition. Current computational workflows for comparative analyses typically use discrete clusters as input when testing for differential abundance between experimental conditions. However, clusters are not always an optimal representation of the biological manifold on which cells lie, especially in the context of continuous differentiation trajectories. To overcome these barriers to discovery, we present Milo, a flexible and scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighbourhoods on a k-nearest neighbour graph. Our method samples and refines neighbourhoods across the graph and leverages the flexibility of generalized linear models, making it applicable to a wide range of experimental settings. Using simulations, we show that Milo is both robust and sensitive, and can reveal subtle but important cell state perturbations that are obscured by discretizing cells into clusters. We illustrate the power of Milo by identifying the perturbed differentiation during ageing of a lineage-biased thymic epithelial precursor state and by uncovering extensive perturbation to multiple lineages in human cirrhotic liver. Milo is provided as an open-source R software package with documentation and tutorials at https://github.com/MarioniLab/miloR.


Author(s):  
Eugene Katsevich ◽  
Timothy Barry ◽  
Kathryn Roeder

Single-cell CRISPR screens are an emerging biotechnology promising unprecedented insights into gene regulation. However, the analysis of these screens presents significant statistical challenges. For example, technical factors like sequencing depth impact not only expression measurement but also perturbation detection, creating a confounding effect. We demonstrate on two recent large-scale single-cell CRISPR screens how these challenges cause calibration issues among existing analysis methods. To address these challenges, we propose SCEPTRE: analysis of single-cell perturbation screens via conditional resampling. This methodology, designed to avoid calibration issues due to technical confounders and expression model misspecification, infers associations between perturbations and expression by resampling the former according to a working model for perturbation detection probability in each cell. SCETPRE demonstrates excellent calibration and sensitivity on the CRISPR screen data and yields 200 new regulatory relationships, many of which are supported by existing functional data.


2017 ◽  
Author(s):  
F. Alexander Wolf ◽  
Philipp Angerer ◽  
Fabian J. Theis

We present Scanpy, a scalable toolkit for analyzing single-cell gene expression data. It includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The Python-based implementation efficiently deals with datasets of more than one million cells and enables easy interfacing of advanced machine learning packages. Code is available fromhttps://github.com/theislab/scanpy.


2018 ◽  
Author(s):  
Anamarija Jurisic ◽  
Chloe Robin ◽  
Pavel Tarlykov ◽  
Lee Siggens ◽  
Brigitte Schoell ◽  
...  

ABSTRACTAnalysis of large-scale interphase genome positioning with reference to a nuclear landmark has recently been studied using sequencing-based single cell approaches. However, these approaches are dependent upon technically challenging, time consuming and costly high throughput sequencing technologies, requiring specialized bioinformatics tools and expertise. Here, we propose a novel, affordable and robust microscopy-based single cell approach, termed Topokaryotyping, to analyze and reconstruct the interphase positioning of genomic loci relative to a given nuclear landmark, detectable as banding pattern on mitotic chromosomes. This is accomplished by proximity-dependent histone labeling, where biotin ligase BirA fused to nuclear envelope marker Emerin was coexpressed together with Biotin Acceptor Peptide (BAP)-histone fusion followed by (i) biotin labeling, (ii) generation of mitotic spreads, (iii) detection of the biotin label on mitotic chromosomes and (iv) their identification by karyotyping. Using Topokaryotyping, we identified both cooperativity and stochasticity in the positioning of emerin-associated chromatin domains in individual cells. Furthermore, the chromosome-banding pattern showed dynamic changes in emerin-associated domains upon physical and radiological stress. In summary, Topokaryotyping is a sensitive and reliable technique to quantitatively analyze spatial positioning of genomic regions interacting with a given nuclear landmark at the single cell level in various experimental conditions.


2020 ◽  
Vol 49 (D1) ◽  
pp. D97-D103
Author(s):  
Li Fang ◽  
Yunjin Li ◽  
Lu Ma ◽  
Qiyue Xu ◽  
Fei Tan ◽  
...  

Abstract Gene regulatory networks (GRNs) formed by transcription factors (TFs) and their downstream target genes play essential roles in gene expression regulation. Moreover, GRNs can be dynamic changing across different conditions, which are crucial for understanding the underlying mechanisms of disease pathogenesis. However, no existing database provides comprehensive GRN information for various human and mouse normal tissues and diseases at the single-cell level. Based on the known TF-target relationships and the large-scale single-cell RNA-seq data collected from public databases as well as the bulk data of The Cancer Genome Atlas and the Genotype-Tissue Expression project, we systematically predicted the GRNs of 184 different physiological and pathological conditions of human and mouse involving >633 000 cells and >27 700 bulk samples. We further developed GRNdb, a freely accessible and user-friendly database (http://www.grndb.com/) for searching, comparing, browsing, visualizing, and downloading the predicted information of 77 746 GRNs, 19 687 841 TF-target pairs, and related binding motifs at single-cell/bulk resolution. GRNdb also allows users to explore the gene expression profile, correlations, and the associations between expression levels and the patient survival of diverse cancers. Overall, GRNdb provides a valuable and timely resource to the scientific community to elucidate the functions and mechanisms of gene expression regulation in various conditions.


2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Andrea Brovelli

Granger causality analysis is becoming central for the analysis of interactions between neural populations and oscillatory networks. However, it is currently unclear whether single-trial estimates of Granger causality spectra can be used reliably to assess directional influence. We addressed this issue by combining single-trial Granger causality spectra with statistical inference based on general linear models. The approach was assessed on synthetic and neurophysiological data. Synthetic bivariate data was generated using two autoregressive processes with unidirectional coupling. We simulated two hypothetical experimental conditions: the first mimicked a constant and unidirectional coupling, whereas the second modelled a linear increase in coupling across trials. The statistical analysis of single-trial Granger causality spectra, based ont-tests and linear regression, successfully recovered the underlying pattern of directional influence. In addition, we characterised the minimum number of trials and coupling strengths required for significant detection of directionality. Finally, we demonstrated the relevance for neurophysiology by analysing two local field potentials (LFPs) simultaneously recorded from the prefrontal and premotor cortices of a macaque monkey performing a conditional visuomotor task. Our results suggest that the combination of single-trial Granger causality spectra and statistical inference provides a valuable tool for the analysis of large-scale cortical networks and brain connectivity.


Biomolecules ◽  
2018 ◽  
Vol 8 (4) ◽  
pp. 158
Author(s):  
Ludwig Lausser ◽  
Lea Siegle ◽  
Wolfgang Rottbauer ◽  
Derk Frank ◽  
Steffen Just ◽  
...  

Genetic model organisms have the potential of removing blind spots from the underlying gene regulatory networks of human diseases. Allowing analyses under experimental conditions they complement the insights gained from observational data. An inevitable requirement for a successful trans-species transfer is an abstract but precise high-level characterization of experimental findings. In this work, we provide a large-scale analysis of seven weak contractility/heart failure genotypes of the model organism zebrafish which all share a weak contractility phenotype. In supervised classification experiments, we screen for discriminative patterns that distinguish between observable phenotypes (homozygous mutant individuals) as well as wild-type (homozygous wild-types) and carriers (heterozygous individuals). As the method of choice we use semantic multi-classifier systems, a knowledge-based approach which constructs hypotheses from a predefined vocabulary of high-level terms (e.g., Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways or Gene Ontology (GO) terms). Evaluating these models leads to a compact description of the underlying processes and guides the screening for new molecular markers of heart failure. Furthermore, we were able to independently corroborate the identified processes in Wistar rats.


Sign in / Sign up

Export Citation Format

Share Document