metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA

Abstract Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA–protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP- and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data. Based on the metaseq-enabled analysis presented here, we propose a model where Shep associates with chromatin cotranscriptionally, then is recruited to insulator complexes in trans where it plays a negative role in insulator activity.

Download Full-text

Shep RNA-Binding Capacity Is Required for Antagonism of gypsy Chromatin Insulator Activity

G3 Genes|Genome|Genetics ◽

10.1534/g3.118.200923 ◽

2019 ◽

pp. g3.200923.2018 ◽

Cited By ~ 1

Author(s):

Dahong Chen ◽

Margarita Brovkina ◽

Leah H. Matzat ◽

Elissa P. Lei

Keyword(s):

Rna Binding ◽

Binding Capacity ◽

Chromatin Insulator ◽

Insulator Activity

Download Full-text

Integrated analysis of RNA-binding protein complexes using in vitro selection and high-throughput sequencing and sequence specificity landscapes (SEQRS)

Methods ◽

10.1016/j.ymeth.2016.10.001 ◽

2017 ◽

Vol 118-119 ◽

pp. 171-181 ◽

Cited By ~ 16

Author(s):

Tzu-Fang Lou ◽

Chase A. Weidmann ◽

Jordan Killingsworth ◽

Traci M. Tanaka Hall ◽

Aaron C. Goldstrohm ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

In Vitro Selection ◽

Rna Binding ◽

Protein Complexes ◽

Binding Protein ◽

Rna Binding Protein ◽

Integrated Analysis ◽

Sequence Specificity

Download Full-text

RNA-Centric Approaches to Profile the RNA–Protein Interaction Landscape on Selected RNAs

Non-Coding RNA ◽

10.3390/ncrna7010011 ◽

2021 ◽

Vol 7 (1) ◽

pp. 11 ◽

Cited By ~ 1

Author(s):

André P. Gerber

Keyword(s):

Mass Spectrometry ◽

Protein Interactions ◽

Regulatory Networks ◽

Rna Binding ◽

Rna Binding Proteins ◽

Protein Complexes ◽

Cell Protein ◽

Transcriptional Regulatory Networks ◽

Technological Advances

RNA–protein interactions frame post-transcriptional regulatory networks and modulate transcription and epigenetics. While the technological advances in RNA sequencing have significantly expanded the repertoire of RNAs, recently developed biochemical approaches combined with sensitive mass-spectrometry have revealed hundreds of previously unrecognized and potentially novel RNA-binding proteins. Nevertheless, a major challenge remains to understand how the thousands of RNA molecules and their interacting proteins assemble and control the fate of each individual RNA in a cell. Here, I review recent methodological advances to approach this problem through systematic identification of proteins that interact with particular RNAs in living cells. Thereby, a specific focus is given to in vivo approaches that involve crosslinking of RNA–protein interactions through ultraviolet irradiation or treatment of cells with chemicals, followed by capture of the RNA under study with antisense-oligonucleotides and identification of bound proteins with mass-spectrometry. Several recent studies defining interactomes of long non-coding RNAs, viral RNAs, as well as mRNAs are highlighted, and short reference is given to recent in-cell protein labeling techniques. These recent experimental improvements could open the door for broader applications and to study the remodeling of RNA–protein complexes upon different environmental cues and in disease.

Download Full-text

Advantages of using graph databases to explore chromatin conformation capture experiments

BMC Bioinformatics ◽

10.1186/s12859-020-03937-0 ◽

2021 ◽

Vol 22 (S2) ◽

Author(s):

Daniele D’Agostino ◽

Pietro Liò ◽

Marco Aldinucci ◽

Ivan Merelli

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Cell Types ◽

Graph Database ◽

Graph Databases ◽

Sources Of Information ◽

Chromosome Conformation ◽

Wide Scale ◽

User Friendly ◽

Different Cell Types

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Download Full-text

The landscape of alternative polyadenylation in single cells of the developing mouse embryo

Nature Communications ◽

10.1038/s41467-021-25388-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Vikram Agarwal ◽

Sereno Lopez-Darwin ◽

David R. Kelley ◽

Jay Shendure

Keyword(s):

Rna Binding ◽

Developmental Stages ◽

Single Cells ◽

Neuronal Cell ◽

Alternative Polyadenylation ◽

Cell Types ◽

Untranslated Regions ◽

Mammalian Development ◽

Translation Rate ◽

Dynamic Landscape

Abstract3′ untranslated regions (3′ UTRs) post-transcriptionally regulate mRNA stability, localization, and translation rate. While 3′-UTR isoforms have been globally quantified in limited cell types using bulk measurements, their differential usage among cell types during mammalian development remains poorly characterized. In this study, we examine a dataset comprising ~2 million nuclei spanning E9.5–E13.5 of mouse embryonic development to quantify transcriptome-wide changes in alternative polyadenylation (APA). We observe a global lengthening of 3′ UTRs across embryonic stages in all cell types, although we detect shorter 3′ UTRs in hematopoietic lineages and longer 3′ UTRs in neuronal cell types within each stage. An analysis of RNA-binding protein (RBP) dynamics identifies ELAV-like family members, which are concomitantly induced in neuronal lineages and developmental stages experiencing 3′-UTR lengthening, as putative regulators of APA. By measuring 3′-UTR isoforms in an expansive single cell dataset, our work provides a transcriptome-wide and organism-wide map of the dynamic landscape of alternative polyadenylation during mammalian organogenesis.

Download Full-text

Identification of residue inversions in large phylogenies of duplicated proteins

10.1101/2021.11.04.467263 ◽

2021 ◽

Author(s):

Stefano Pascarelli ◽

Paola Laurino

Keyword(s):

Gene Duplication ◽

Protein Sequence ◽

Protein Function ◽

High Throughput Sequencing ◽

Functional Divergence ◽

Growth Factor Receptor ◽

Sequence Evolution ◽

Protein Database ◽

Sequencing Studies ◽

Homology Relationship

Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. Protein database annotation helps to bridge this gap; however, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify "residue inversions", i.e., sites where the relationship between the ancestry and the functional signal is decoupled. Residues in these sites play a role in functional divergence and could indicate a shift in protein function. We develop a method to recognize residue inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 positions that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain.

Download Full-text

Current Understanding of Circular RNAs in Systemic Lupus Erythematosus

Frontiers in Immunology ◽

10.3389/fimmu.2021.628872 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hongjiang Liu ◽

Yundong Zou ◽

Chen Chen ◽

Yundi Tang ◽

Jianping Guo

Keyword(s):

Systemic Lupus Erythematosus ◽

Lupus Erythematosus ◽

Rna Binding ◽

Rna Binding Proteins ◽

Protein Complexes ◽

Expression Patterns ◽

Regulation Of Gene Expression ◽

Current Understanding ◽

Circular Rnas ◽

Systemic Lupus

Systemic lupus erythematosus (SLE) is a common and potentially fatal autoimmune disease that affects multiple organs. To date, its etiology and pathogenesis remains elusive. Circular RNAs (circRNAs) are a novel class of endogenous non-coding RNAs with covalently closed loop structure. Growing evidence has demonstrated that circRNAs may play an essential role in regulation of gene expression and transcription by acting as microRNA (miRNA) sponges, impacting cell survival and proliferation by interacting with RNA binding proteins (RBPs), and strengthening mRNA stability by forming RNA-protein complexes duplex structures. The expression patterns of circRNAs exhibit tissue-specific and pathogenesis-related manner. CircRNAs have implicated in the development of multiple autoimmune diseases, including SLE. In this review, we summarize the characteristics, biogenesis, and potential functions of circRNAs, its impact on immune responses and highlight current understanding of circRNAs in the pathogenesis of SLE.

Download Full-text

Systematic discovery of endogenous human ribonucleoprotein complexes

10.1101/480061 ◽

2018 ◽

Cited By ~ 2

Author(s):

Anna L. Mallam ◽

Wisath Sae-Lee ◽

Jeffrey M. Schaub ◽

Fan Tu ◽

Anna Battenhouse ◽

...

Keyword(s):

Rna Binding ◽

Rna Binding Proteins ◽

Protein Complexes ◽

Embryonic Stem ◽

Human Protein ◽

Disease States ◽

Ribonucleoprotein Complexes ◽

Associated Proteins ◽

Domains Of Life ◽

Factor C

AbstractRNA-binding proteins (RBPs) play essential roles in biology and are frequently associated with human disease. While recent studies have systematically identified individual RBPs, their higher order assembly intoRibonucleoprotein (RNP) complexes has not been systematically investigated. Here, we describe a proteomics method for systematic identification of RNP complexes in human cells. We identify 1,428 protein complexes that associate with RNA, indicating that over 20% of known human protein complexes contain RNA. To explore the role of RNA in the assembly of each complex, we identify complexes that dissociate, change composition, or form stable protein-only complexes in the absence of RNA. Importantly, these data also provide specific novel insights into the function of well-studied protein complexes not previously known to associate with RNA, including replication factor C (RFC) and cytokinetic centralspindlin complex. Finally, we use our method to systematically identify cell-type specific RNA-associated proteins in mouse embryonic stem cells. We distribute these data as a resource, rna.MAP (rna.proteincomplexes.org) which provides a comprehensive dataset for the study of RNA-associated protein complexes. Our system thus provides a novel methodology for further explorations across human tissues and disease states, as well as throughout all domains of life.SummaryAn exploration of human protein complexes in the presence and absence of RNA reveals endogenous ribonucleoprotein complexes

Download Full-text

Assessing and Maximizing Cultivated Diversity with Plate-Wash PCR and High Throughput Sequencing

10.1101/2020.11.19.390864 ◽

2020 ◽

Author(s):

Emily N. Junkins ◽

Bradley S. Stevenson

Keyword(s):

High Throughput ◽

Drug Targets ◽

High Throughput Sequencing ◽

Multidrug Resistant ◽

Molecular Techniques ◽

Bioactive Metabolites ◽

Plate Count ◽

Molecular Tools ◽

Vast Number ◽

Sequencing Studies

AbstractMolecular techniques continue to reveal a growing disparity between the immense diversity of microbial life and the small proportion that is in pure culture. The disparity, originally dubbed “the great plate count anomaly” by Staley and Konopka, has become even more vexing given our increased understanding of the importance of microbiomes to a host and the role of microorganisms in the vital biogeochemical functions of our biosphere. Searching for novel antimicrobial drug targets often focuses on screening a broad diversity of microorganisms. If diverse microorganisms are to be screened, they need to be cultivated. Recent innovative research has used molecular techniques to assess the efficacy of cultivation efforts, providing invaluable feedback to cultivation strategies for isolating targeted and/or novel microorganisms. Here, we aimed to determine the efficiency of cultivating representative microorganisms from a non-human, mammalian microbiome, identify those microorganisms, and determine the bioactivity of isolates. Molecular methods indicated that around 57% of the ASVs detected in the original inoculum were cultivated in our experiments, but nearly 53% of the total ASVs that were present in our cultivation experiments were not detected in the original inoculum. In light of our controls, our data suggests that when molecular tools were used to characterize our cultivation efforts, they provided a more complete, albeit more complex, understanding of which organisms were present compared to what was eventually cultivated. Lastly, about 3% of the isolates collected from our cultivation experiments showed inhibitory bioactivity against a multidrug-resistant pathogen panel, further highlighting the importance of informing and directing future cultivation efforts with molecular tools.ImportanceCultivation is the definitive tool to understand a microorganism’s physiology, metabolism, and ecological role(s). Despite continuous efforts to hone this skill, researchers are still observing yet-to-be cultivated organisms through high-throughput sequencing studies. Here, we use the very same tool that highlights biodiversity to assess cultivation efficiency. When applied to drug discovery, where screening a vast number of isolates for bioactive metabolites is common, cultivating redundant organisms is a hindrance. However, we observed that cultivating in combination with molecular tools can expand the observed diversity of an environment and its community, potentially increasing the number of microorganisms to be screened for natural products.

Download Full-text

Functional annotation of human long noncoding RNAs using chromatin conformation data

10.1101/2021.01.13.426305 ◽

2021 ◽

Author(s):

Saumya Agrawal ◽

Tanvir Alam ◽

Masaru Koido ◽

Ivan V. Kulakovskiy ◽

Jessica Severin ◽

...

Keyword(s):

Functional Annotation ◽

Rna Binding ◽

Functional Characterization ◽

Cell Types ◽

Chromatin Interaction ◽

Spatial Proximity ◽

Chromatin Conformation ◽

Cell Type ◽

Cell Type Specific ◽

Rna Domains

AbstractTranscription of the human genome yields mostly long non-coding RNAs (lncRNAs). Systematic functional annotation of lncRNAs is challenging due to their low expression level, cell type-specific occurrence, poor sequence conservation between orthologs, and lack of information about RNA domains. Currently, 95% of human lncRNAs have no functional characterization. Using chromatin conformation and Cap Analysis of Gene Expression (CAGE) data in 18 human cell types, we systematically located genomic regions in spatial proximity to lncRNA genes and identified functional clusters of interacting protein-coding genes, lncRNAs and enhancers. Using these clusters we provide a cell type-specific functional annotation for 7,651 out of 14,198 (53.88%) lncRNAs. LncRNAs tend to have specialized roles in the cell type in which it is first expressed, and to incorporate more general functions as its expression is acquired by multiple cell types during evolution. By analyzing RNA-binding protein and RNA-chromatin interaction data in the context of the spatial genomic interaction map, we explored mechanisms by which these lncRNAs can act.

Download Full-text