scholarly journals Defining data-driven primary transcript annotations with primaryTranscriptAnnotation in R

2019 ◽  
Author(s):  
Warren D. Anderson ◽  
Fabiana M. Duarte ◽  
Mete Civelek ◽  
Michael J. Guertin

Nascent transcript measurements derived from run-on sequencing experiments are critical for the investigation of transcriptional mechanisms and regulatory networks. However, conventional gene annotations specify the boundaries of mRNAs, which significantly differ from the boundaries of primary transcripts. Moreover, transcript isoforms with distinct transcription start and end coordinates can vary between cell types. Therefore, new primary transcript annotations are needed to accurately interpret run-on data. We developed the primaryTranscriptAnnotation R package to infer the transcriptional start and termination sites of annotated genes from genomic run-on data. We then used these inferred co-ordinates to annotate transcriptional units identified de novo. Hence, this package provides the novel utility to integrate data-driven primary transcript annotations with transcriptional unit coordinates identified in an unbiased manner. Our analyses demonstrated that this new methodology increases the sensitivity for detecting differentially expressed transcripts and provides more accurate quantification of RNA polymerase pause indices, consistent with the importance of using accurate primary transcript coordinates for interpreting genomic nascent transcription data.Availabilityhttps://github.com/WarrenDavidAnderson/genomicsRpackage/tree/master/primaryTranscriptAnnotation

2020 ◽  
Vol 36 (9) ◽  
pp. 2926-2928 ◽  
Author(s):  
Warren D Anderson ◽  
Fabiana M Duarte ◽  
Mete Civelek ◽  
Michael J Guertin

Abstract Summary Nascent transcript measurements derived from run-on sequencing experiments are critical for the investigation of transcriptional mechanisms and regulatory networks. However, conventional mRNA gene annotations significantly differ from the boundaries of primary transcripts. New primary transcript annotations are needed to accurately interpret run-on data. We developed the primaryTranscriptAnnotation R package to infer the transcriptional start and termination sites of primary transcripts from genomic run-on data. We then used these inferred coordinates to annotate transcriptional units identified de novo. This package provides the novel utility to integrate data-driven primary transcript annotations with transcriptional unit coordinates identified in an unbiased manner. Highlighting the importance of using accurate primary transcript coordinates, we demonstrate that this new methodology increases the detection of differentially expressed transcripts and provides more accurate quantification of RNA polymerase pause indices. Availability and implementation https://github.com/WarrenDavidAnderson/genomicsRpackage/tree/master/primaryTranscriptAnnotation. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Allen W Zhang ◽  
Kieran R Campbell ◽  
Sohrab P Shah

Abstract Assigning cells to known or de-novo cell types is an important step in the analysis of single-cell RNA-sequencing (scRNA-seq) data. This protocol outlines how to use the CellAssign R package to accomplish this.


2019 ◽  
Author(s):  
Xi Chen

AbstractBICORN is an R package developed to integrate prior transcription factor binding information and gene expression data for cis-regulatory module (CRM) inference. BICORN searches for a list of candidate CRMs from binary bindings on potential target genes. Applying Gibbs sampling, BICORN samples CRMs for each gene using the fitting performance of transcription factor activities and regulation strengths of TFs in each CRM on gene expression. Consequently, sparse regulatory networks are inferred as functional CRMs regulating target genes. The BICORN package is implemented in R and is available at https://cran.r-project.org/web/packages/BICORN/index.html.


2021 ◽  
Author(s):  
April R Kriebel ◽  
Joshua D Welch

Single-cell genomic technologies provide an unprecedented opportunity to define molecular cell types in a data-driven fashion, but present unique data integration challenges. Integration analyses often involve datasets with partially overlapping features, including both shared features that occur in all datasets and features exclusive to a single experiment. Previous computational integration approaches require that the input matrices share the same number of either genes or cells, and thus can use only shared features. To address this limitation, we derive a novel nonnegative matrix factorization algorithm for integrating single-cell datasets containing both shared and unshared features. The key advance is incorporating an additional metagene matrix that allows unshared features to inform the factorization. We demonstrate that incorporating unshared features significantly improves integration of single-cell RNA-seq, spatial transcriptomic, SHARE-seq, and cross-species datasets. We have incorporated the UINMF algorithm into the open-source LIGER R package (https://github.com/welch-lab/liger).


Author(s):  
Günter P. Wagner

Homology—a similar trait shared by different species and derived from common ancestry, such as a seal's fin and a bird's wing—is one of the most fundamental yet challenging concepts in evolutionary biology. This book provides the first mechanistically based theory of what homology is and how it arises in evolution. The book argues that homology, or character identity, can be explained through the historical continuity of character identity networks—that is, the gene regulatory networks that enable differential gene expression. It shows how character identity is independent of the form and function of the character itself because the same network can activate different effector genes and thus control the development of different shapes, sizes, and qualities of the character. Demonstrating how this theoretical model can provide a foundation for understanding the evolutionary origin of novel characters, the book applies it to the origin and evolution of specific systems, such as cell types; skin, hair, and feathers; limbs and digits; and flowers. The first major synthesis of homology to be published in decades, this book reveals how a mechanistically based theory can serve as a unifying concept for any branch of science concerned with the structure and development of organisms, and how it can help explain major transitions in evolution and broad patterns of biological diversity.


2020 ◽  
Author(s):  
Xin Yi See ◽  
Benjamin Reiner ◽  
Xuelan Wen ◽  
T. Alexander Wheeler ◽  
Channing Klein ◽  
...  

<div> <div> <div> <p>Herein, we describe the use of iterative supervised principal component analysis (ISPCA) in de novo catalyst design. The regioselective synthesis of 2,5-dimethyl-1,3,4-triphenyl-1H- pyrrole (C) via Ti- catalyzed formal [2+2+1] cycloaddition of phenyl propyne and azobenzene was targeted as a proof of principle. The initial reaction conditions led to an unselective mixture of all possible pyrrole regioisomers. ISPCA was conducted on a training set of catalysts, and their performance was regressed against the scores from the top three principal components. Component loadings from this PCA space along with k-means clustering were used to inform the design of new test catalysts. The selectivity of a prospective test set was predicted in silico using the ISPCA model, and only optimal candidates were synthesized and tested experimentally. This data-driven predictive-modeling workflow was iterated, and after only three generations the catalytic selectivity was improved from 0.5 (statistical mixture of products) to over 11 (> 90% C) by incorporating 2,6-dimethyl- 4-(pyrrolidin-1-yl)pyridine as a ligand. The successful development of a highly selective catalyst without resorting to long, stochastic screening processes demonstrates the inherent power of ISPCA in de novo catalyst design and should motivate the general use of ISPCA in reaction development. </p> </div> </div> </div>


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruizhu Huang ◽  
Charlotte Soneson ◽  
Pierre-Luc Germain ◽  
Thomas S.B. Schmidt ◽  
Christian Von Mering ◽  
...  

AbstracttreeclimbR is for analyzing hierarchical trees of entities, such as phylogenies or cell types, at different resolutions. It proposes multiple candidates that capture the latent signal and pinpoints branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single-cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Luis F. Iglesias-Martinez ◽  
Barbara De Kegel ◽  
Walter Kolch

AbstractReconstructing gene regulatory networks is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-the-art algorithms are often not able to process large amounts of data within reasonable time. Furthermore, many of the existing methods predict numerous false positives and have limited capabilities to integrate other sources of information, such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. We have benchmarked KBoost against other high performing algorithms using three different datasets. The results show that our method compares favorably to other methods across datasets. We have also applied KBoost to a large cohort of close to 2000 breast cancer patients and 24,000 genes in less than 2 h on standard hardware. Our results show that molecularly defined breast cancer subtypes also feature differences in their GRNs. An implementation of KBoost in the form of an R package is available at: https://github.com/Luisiglm/KBoost and as a Bioconductor software package.


2018 ◽  
Vol 34 (1) ◽  
pp. 289-310 ◽  
Author(s):  
Edith Pierre-Jerome ◽  
Colleen Drapek ◽  
Philip N. Benfey

A major challenge in developmental biology is unraveling the precise regulation of plant stem cell maintenance and the transition to a fully differentiated cell. In this review, we highlight major themes coordinating the acquisition of cell identity and subsequent differentiation in plants. Plant cells are immobile and establish position-dependent cell lineages that rely heavily on external cues. Central players are the hormones auxin and cytokinin, which balance cell division and differentiation during organogenesis. Transcription factors and miRNAs, many of which are mobile in plants, establish gene regulatory networks that communicate cell position and fate. Small peptide signaling also provides positional cues as new cell types emerge from stem cell division and progress through differentiation. These pathways recruit similar players for patterning different organs, emphasizing the modular nature of gene regulatory networks. Finally, we speculate on the outstanding questions in the field and discuss how they may be addressed by emerging technologies.


Sign in / Sign up

Export Citation Format

Share Document