scholarly journals PyBoost: A parallelized Python implementation of 2D boosting with hierarchies

2017 ◽  
Author(s):  
Peyton G. Greenside ◽  
Nadine Hussami ◽  
Jessica Chang ◽  
Anshul Kundaje

AbstractMotivation:Gene expression is controlled by networks of transcription factors that bind specific sequence motifs in regulatory DNA elements such as promoters and enhancers. GeneClass is a boosting-based algorithm that learns gene regulatory networks from complementary paired feature sets such as transcription factor expression levels and binding motifs across conditions. This algorithm can be used to predict functional genomics measures of cell state, such as gene expression and chromatin accessibility, in different cellular conditions. We present a parallelized, Python-based implementation of GeneClass, called PyBoost, along with a novel hierarchical implementation of the algorithm, called HiBoost. HiBoost allows regulatory logic to be constrained to a hierarchical group of conditions or cell types. The software can be used to dissect differentiation cascades, time courses or other perturbation data that naturally form a hierarchy or trajectory. We demonstrate the application of PyBoost and HiBoost to learn regulators of tadpole tail regeneration and hematopoeitic stem cell differentiation and validate learned regulators through an inducible CRISPR system.Availability:The implementation is publicly available here:https://github.com/kundajelab/boosting2D/.


2021 ◽  
Author(s):  
Vinay K Kartha ◽  
Fabiana M Duarte ◽  
Yan Hu ◽  
Sai Ma ◽  
Jennifer G Chew ◽  
...  

Cells require coordinated control over gene expression when responding to environmental stimuli. Here, we apply scATAC-seq and scRNA-seq in resting and stimulated human blood cells. Collectively, we generate ~91,000 single-cell profiles, allowing us to probe the cis -regulatory landscape of immunological response across cell types, stimuli and time. Advancing tools to integrate multi-omic data, we develop FigR - a framework to computationally pair scATAC-seq with scRNA-seq cells, connect distal cis -regulatory elements to genes, and infer gene regulatory networks (GRNs) to identify candidate TF regulators. Utilizing these paired multi-omic data, we define Domains of Regulatory Chromatin (DORCs) of immune stimulation and find that cells alter chromatin accessibility prior to production of gene expression at time scales of minutes. Further, the construction of the stimulation GRN elucidates TF activity at disease-associated DORCs. Overall, FigR enables the elucidation of regulatory interactions across single-cell data, providing new opportunities to understand the function of cells within tissues.



Author(s):  
Ana M. Sotoca ◽  
Michael Weber ◽  
Everardus J. J. van Zoelen

Human mesenchymal stem cells have a high potential in regenerative medicine. They can be isolated from a variety of adult tissues, including bone marrow, and can be differentiated into multiple cell types of the mesodermal lineage, including adipocytes, osteocytes, and chondrocytes. Stem cell differentiation is controlled by a process of interacting lineage-specific and multipotent genes. In this chapter, the authors use full genome microarrays to explore gene expression profiles in the process of Osteo-, Adipo-, and Chondro-Genic lineage commitment of human mesenchymal stem cells.



2019 ◽  
Vol 36 (1) ◽  
pp. 197-204 ◽  
Author(s):  
Xin Zhou ◽  
Xiaodong Cai

Abstract Motivation Gene regulatory networks (GRNs) of the same organism can be different under different conditions, although the overall network structure may be similar. Understanding the difference in GRNs under different conditions is important to understand condition-specific gene regulation. When gene expression and other relevant data under two different conditions are available, they can be used by an existing network inference algorithm to estimate two GRNs separately, and then to identify the difference between the two GRNs. However, such an approach does not exploit the similarity in two GRNs, and may sacrifice inference accuracy. Results In this paper, we model GRNs with the structural equation model (SEM) that can integrate gene expression and genetic perturbation data, and develop an algorithm named fused sparse SEM (FSSEM), to jointly infer GRNs under two conditions, and then to identify difference of the two GRNs. Computer simulations demonstrate that the FSSEM algorithm outperforms the approaches that estimate two GRNs separately. Analysis of a dataset of lung cancer and another dataset of gastric cancer with FSSEM inferred differential GRNs in cancer versus normal tissues, whose genes with largest network degrees have been reported to be implicated in tumorigenesis. The FSSEM algorithm provides a valuable tool for joint inference of two GRNs and identification of the differential GRN under two conditions. Availability and implementation The R package fssemR implementing the FSSEM algorithm is available at https://github.com/Ivis4ml/fssemR.git. It is also available on CRAN. Supplementary information Supplementary data are available at Bioinformatics online.



F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1477
Author(s):  
Guangdun Peng ◽  
Jing-Dong J. Han

Embryonic development and stem cell differentiation, during which coordinated cell fate specification takes place in a spatial and temporal context, serve as a paradigm for studying the orderly assembly of gene regulatory networks (GRNs) and the fundamental mechanism of GRNs in driving lineage determination. However, knowledge of reliable GRN annotation for dynamic development regulation, particularly for unveiling the complex temporal and spatial architecture of tissue stem cells, remains inadequate. With the advent of single-cell RNA sequencing technology, elucidating GRNs in development and stem cell processes poses both new challenges and unprecedented opportunities. This review takes a snapshot of some of this work and its implication in the regulative nature of early mammalian development and specification of the distinct cell types during embryogenesis.



2020 ◽  
Vol 21 (23) ◽  
pp. 9052
Author(s):  
Indrek Teino ◽  
Antti Matvere ◽  
Martin Pook ◽  
Inge Varik ◽  
Laura Pajusaar ◽  
...  

Aryl hydrocarbon receptor (AHR) is a ligand-activated transcription factor, which mediates the effects of a variety of environmental stimuli in multiple tissues. Recent advances in AHR biology have underlined its importance in cells with high developmental potency, including pluripotent stem cells. Nonetheless, there is little data on AHR expression and its role during the initial stages of stem cell differentiation. The purpose of this study was to investigate the temporal pattern of AHR expression during directed differentiation of human embryonic stem cells (hESC) into neural progenitor, early mesoderm and definitive endoderm cells. Additionally, we investigated the effect of the AHR agonist 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) on the gene expression profile in hESCs and differentiated cells by RNA-seq, accompanied by identification of AHR binding sites by ChIP-seq and epigenetic landscape analysis by ATAC-seq. We showed that AHR is differentially regulated in distinct lineages. We provided evidence that TCDD alters gene expression patterns in hESCs and during early differentiation. Additionally, we identified novel potential AHR target genes, which expand our understanding on the role of this protein in different cell types.



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Giancarlo Bonora ◽  
Vijay Ramani ◽  
Ritambhara Singh ◽  
He Fang ◽  
Dana L. Jackson ◽  
...  

Abstract Background Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. Results Allele-specific contact decay profiles obtained by single-cell Hi-C clearly show that the inactive X chromosome has a unique profile in differentiated cells that have undergone X inactivation. Loss of this inactive X-specific structure at mitosis is followed by its reappearance during the cell cycle, suggesting a “bookmark” mechanism. Differentiation of embryonic stem cells to follow the onset of X inactivation is associated with changes in contact decay profiles that occur in parallel on both the X chromosomes and autosomes. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Single-cell Hi-C highlights evidence of discrete changes in nuclear structure characterized by the acquisition of very long-range contacts throughout the nucleus. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. Conclusions Based on trajectory analyses, three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes, but also to that of autosomes during differentiation. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.



2020 ◽  
Author(s):  
Pallavi Singh ◽  
Sean R. Stevenson ◽  
Ivan Reyna-Llorens ◽  
Gregory Reeves ◽  
Tina B. Schreier ◽  
...  

ABSTRACTThe efficient C4 pathway is based on strong up-regulation of genes found in C3 plants, but also compartmentation of their expression into distinct cell-types such as the mesophyll and bundle sheath. Transcription factors associated with these phenomena have not been identified. To address this, we undertook genome-wide analysis of transcript accumulation, chromatin accessibility and transcription factor binding in C4Gynandropsis gynandra. From these data, two models relating to the molecular evolution of C4 photosynthesis are proposed. First, increased expression of C4 genes is associated with increased binding by MYB-related transcription factors. Second, mesophyll specific expression is associated with binding of homeodomain transcription factors. Overall, we conclude that during evolution of the complex C4 trait, C4 cycle genes gain cis-elements that operate in the C3 leaf such that they become integrated into existing gene regulatory networks associated with cell specificity and photosynthesis.



2020 ◽  
Author(s):  
Giancarlo Bonora ◽  
Vijay Ramani ◽  
Ritambhara Singh ◽  
He Fang ◽  
Dana Jackson ◽  
...  

AbstractMammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. In differentiated cells, contact decay profiles, which clearly distinguish the active and inactive X chromosomes, reveal loss of the inactive X-specific structure at mitosis followed by a rapid reappearance, suggesting a ‘bookkeeping’ mechanism. In differentiating embryonic stem cells, changes in contact decay profiles are detected in parallel on both the X chromosomes and autosomes, suggesting profound simultaneous reorganization. The onset of the inactive X-specific structure in single cells is notably delayed relative to that of gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Novel computational approaches to effectively align single-cell gene expression, chromatin accessibility, and 3D chromosome structure reveal that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.



2018 ◽  
Author(s):  
Tim Stuart ◽  
Andrew Butler ◽  
Paul Hoffman ◽  
Christoph Hafemeister ◽  
Efthymia Papalexi ◽  
...  

Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to “anchor” diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets.Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat



PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0244864
Author(s):  
Carlos Mora-Martinez

Large amounts of effort have been invested in trying to understand how a single genome is able to specify the identity of hundreds of cell types. Inspired by some aspects of Caenorhabditis elegans biology, we implemented an in silico evolutionary strategy to produce gene regulatory networks (GRNs) that drive cell-specific gene expression patterns, mimicking the process of terminal cell differentiation. Dynamics of the gene regulatory networks are governed by a thermodynamic model of gene expression, which uses DNA sequences and transcription factor degenerate position weight matrixes as input. In a version of the model, we included chromatin accessibility. Experimentally, it has been determined that cell-specific and broadly expressed genes are regulated differently. In our in silico evolved GRNs, broadly expressed genes are regulated very redundantly and the architecture of their cis-regulatory modules is different, in accordance to what has been found in C. elegans and also in other systems. Finally, we found differences in topological positions in GRNs between these two classes of genes, which help to explain why broadly expressed genes are so resilient to mutations. Overall, our results offer an explanatory hypothesis on why broadly expressed genes are regulated so redundantly compared to cell-specific genes, which can be extrapolated to phenomena such as ChIP-seq HOT regions.



Sign in / Sign up

Export Citation Format

Share Document