Improving Bacterial Ribosome Profiling Data Quality

AbstractRibosome profiling (RIBO-seq) in prokaryotes has the potential to facilitate accurate detection of translation initiation sites, to increase understanding of translational dynamics, and has already allowed detection of many unannotated genes. However, protocols for ribosome profiling and corresponding data analysis are not yet standardized. To better understand the influencing factors, we analysed 48 ribosome profiling samples from 9 studies on E. coli K12 grown in LB medium. We particularly investigated the size selection step in each experiment since the selection for ribosome-protected footprints (RPFs) has been performed at various read lengths. We suggest choosing a size range between 22-30 nucleotides in order to obtain protein-coding fragments. In order to use RIBO-seq data for improving gene annotation of weakly expressed genes, the total amount of reads mapping to protein-coding sequences and not rRNA or tRNA is important, but no consensus about the appropriate sequencing depth has been reached. Again, this causes significant variation between studies. Our analysis suggests that 20 million non rRNA/tRNA mapping reads are required for global detection of translated annotated genes. Further, we highlight the influence of drug induced ribosome stalling, causing bias at translation start sites. Drug induced stalling may be especially useful for detecting weakly expressed genes. These suggestions should improve both gene detection and the comparability of resulting ribosome profiling datasets.

Download Full-text

Recommendations for bacterial ribosome profiling experiments based on bioinformatic evaluation of published data

Journal of Biological Chemistry ◽

10.1074/jbc.ra119.012161 ◽

2020 ◽

Vol 295 (27) ◽

pp. 8999-9011 ◽

Cited By ~ 2

Author(s):

Alina Glaub ◽

Christopher Huptas ◽

Klaus Neuhaus ◽

Zachary Ardern

Keyword(s):

Ribosome Profiling ◽

Published Data ◽

Data Sets ◽

Drug Induced ◽

Data Set ◽

Protein Coding ◽

Bacterial Ribosome ◽

Translation Start ◽

Selection Step ◽

Basic Characteristics

Ribosome profiling (RIBO-Seq) has improved our understanding of bacterial translation, including finding many unannotated genes. However, protocols for RIBO-Seq and corresponding data analysis are not yet standardized. Here, we analyzed 48 RIBO-Seq samples from nine studies of Escherichia coli K12 grown in lysogeny broth medium and particularly focused on the size-selection step. We show that for conventional expression analysis, a size range between 22 and 30 nucleotides is sufficient to obtain protein-coding fragments, which has the advantage of removing many unwanted rRNA and tRNA reads. More specific analyses may require longer reads and a corresponding improvement in rRNA/tRNA depletion. There is no consensus about the appropriate sequencing depth for RIBO-Seq experiments in prokaryotes, and studies vary significantly in total read number. Our analysis suggests that 20 million reads that are not mapping to rRNA/tRNA are required for global detection of translated annotated genes. We also highlight the influence of drug-induced ribosome stalling, which causes bias at translation start sites. The resulting accumulation of reads at the start site may be especially useful for detecting weakly expressed genes. As different methods suit different questions, it may not be possible to produce a “one-size-fits-all” ribosome profiling data set. Therefore, experiments should be carefully designed in light of the scientific questions of interest. We propose some basic characteristics that should be reported with any new RIBO-Seq data sets. Careful attention to the factors discussed should improve prokaryotic gene detection and the comparability of ribosome profiling data sets.

Download Full-text

A community-driven roadmap to advance research on translated open reading frames detected by Ribo-seq

10.1101/2021.06.10.447896 ◽

2021 ◽

Author(s):

Jonathan M Mudge ◽

Jorge Ruiz-Orera ◽

John R Prensner ◽

Marie A Brunet ◽

Jose Manuel Gonzalez ◽

...

Keyword(s):

Gene Annotation ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Untranslated Regions ◽

Biological Databases ◽

Protein Coding ◽

Circular Problem ◽

Advance Research ◽

Non Coding Rnas ◽

Reading Frames

Ribosome profiling (Ribo-seq) has catalyzed a paradigm shift in our understanding of the translational vocabulary of the human genome, discovering thousands of translated open reading frames (ORFs) within long non-coding RNAs and presumed untranslated regions of protein-coding genes. However, reference gene annotation projects have been circumspect in their incorporation of these ORFs due to uncertainties about their experimental reproducibility and physiological roles. Yet, it is indisputable that certain Ribo-seq ORFs make stable proteins, others mediate gene regulation, and many have medical implications. Ultimately, the absence of standardized ORF annotation has created a circular problem: while Ribo-seq ORFs remain unannotated by reference biological databases, this lack of characterisation will thwart research efforts examining their roles. Here, we outline the initial stages of a community-led effort supported by GENCODE / Ensembl, HGNC and UniProt to produce a consolidated catalog of human Ribo-seq ORFs.

Download Full-text

Translational landscape in tomato revealed by transcriptome assembly and ribosome profiling

10.1101/534677 ◽

2019 ◽

Author(s):

Hsin-Yen Larry Wu ◽

Gaoyuan Song ◽

Justin W. Walley ◽

Polly Yingshan Hsu

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Mrna Translation ◽

Ribosome Profiling ◽

In Planta ◽

Protein Coding ◽

Tomato Roots ◽

Translation Start ◽

High Throughput Method ◽

And Control

mRNA translation is a critical step in gene expression, but our understanding of the landscape and control of translation in diverse crops remains lacking. Here, we combined de novo transcriptome assembly and ribosome profiling to study global mRNA translation in tomato roots. Taking advantage of the 3-nucleotide periodicity displayed by translating ribosomes, we identified 354 novel small ORFs (sORFs) translated from previously unannotated transcripts, as well as 1329 upstream ORFs (uORFs) translated within the 5-prime UTRs of annotated protein-coding genes. Proteomic analysis confirmed that some of these novel uORFs and sORFs generate stable proteins in planta. Compared with the annotated ORFs, the uORFs use more flexible Kozak sequences around translation start sites. Interestingly, uORF-containing genes are enriched for protein phosphorylation/dephosphorylation and signaling transduction pathways, suggesting a regulatory role for uORFs in these processes. We also demonstrated that ribosome profiling is useful to facilitate the annotation of translated ORFs and noncanonical translation initiation sites. In addition to defining the translatome, our results revealed the global control of mRNA translation by uORFs and microRNAs in tomato. In summary, our approach provides a high-throughput method to discover unannotated ORFs, elucidates evolutionarily conserved translational features, and identifies new regulatory mechanisms hidden in a crop genome.

Download Full-text

Disrupting upstream translation in mRNAs is associated with human disease

Nature Communications ◽

10.1038/s41467-021-21812-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

David S. M. Lee ◽

Joseph Park ◽

Andrew Kromer ◽

Aris Baras ◽

Daniel J. Rader ◽

...

Keyword(s):

Protein Expression ◽

Biological Significance ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Protein Coding ◽

Stop Codons ◽

Human Genes ◽

Strong Negative Selection ◽

Disease Associations ◽

Reading Frames

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.

Download Full-text

Genotyping and antimicrobial resistance in Escherichia coli from pig carcasses

Pesquisa Veterinária Brasileira ◽

10.1590/s0100-736x2017001100010 ◽

2017 ◽

Vol 37 (11) ◽

pp. 1253-1260 ◽

Cited By ~ 1

Author(s):

Caroline Pissetti ◽

Gabriela Orosco Werlang ◽

Jalusa Deon Kich ◽

Marisa Cardoso

Keyword(s):

Escherichia Coli ◽

Antimicrobial Resistance ◽

Antimicrobial Agents ◽

Commensal Bacteria ◽

Pfge Analysis ◽

Gene Detection ◽

E Coli ◽

Antimicrobial Resistance Profile ◽

Resistant Strains ◽

Pig Carcasses

ABSTRACT: The increasing antimicrobial resistance observed worldwide in bacteria isolated from human and animals is a matter of extreme concern and has led to the monitoring of antimicrobial resistance in pathogenic and commensal bacteria. The aim of this study was to evaluate the antimicrobial resistance profile of Escherichia coli isolated from pig carcasses and to assess the occurrence of relevant resistance genes. A total of 319 E. coli isolates were tested for antimicrobial susceptibility against different antimicrobial agents. Moreover, the presence of extended-spectrum β-lactamase (ESBL) and inducible ampC-β-lactamase producers was investigated. Eighteen multi-resistant strains were chosen for resistance gene detection and PFGE characterization. The study showed that resistance to antimicrobials is widespread in E. coli isolated from pig carcasses, since 86.2% of the strains were resistant to at least one antimicrobial and 71.5% displayed multi-resistance profiles. No ampC-producing isolates were detected and only one ESBL-producing E. coli was identified. Genes strA (n=15), floR (n=14), aac(3)IVa (n=13), tetB (n=13), sul2 (n=12), tetA (n=11), aph(3)Ia (n=8) and sul3 (n=5) were detected by PCR. PFGE analysis of these multi-resistant E. coli strains showed less than 80% similarity among them. We conclude that antimicrobial multi-resistant E. coli strains are common on pig carcasses and present highly diverse genotypes and resistance phenotypes and genotypes.

Download Full-text

EnTAP: Bringing Faster and Smarter Functional Annotation to Non-Model Eukaryotic Transcriptomes

10.1101/307868 ◽

2018 ◽

Cited By ~ 5

Author(s):

Alexander J. Hart ◽

Samuel Ginzburg ◽

Muyang (Sam) Xu ◽

Cera R. Fisher ◽

Nasim Rahmatpour ◽

...

Keyword(s):

Similarity Search ◽

De Novo ◽

Gene Annotation ◽

Enrichment Analysis ◽

Orthologous Gene ◽

Protein Domain ◽

Family Assessment ◽

Ontology Term ◽

Protein Coding ◽

Functional Gene Annotation

ABSTRACTEnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates, while focusing primarily on protein-coding transcripts. Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins. Downstream features include fast similarity search across three repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology term assignment. The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness. Researchers have the option to include additional filters to identify and remove contaminants, identify associated pathways, and prepare the transcripts for enrichment analysis. This fully featured pipeline is easy to install, configure, and runs significantly faster than comparable annotation packages. EnTAP is optimized to generate extensive functional information for the gene space of organisms with limited or poorly characterized genomic resources.

Download Full-text

The genome sequence of the European peacock butterfly, Aglais io (Linnaeus, 1758)

Wellcome Open Research ◽

10.12688/wellcomeopenres.17204.1 ◽

2021 ◽

Vol 6 ◽

pp. 258

Author(s):

Konrad Lohse ◽

Alexander Mackintosh ◽

Roger Vila ◽

◽

...

Keyword(s):

Genome Sequence ◽

Genome Assembly ◽

Sex Chromosome ◽

Gene Annotation ◽

Protein Coding ◽

Individual Male ◽

Protein Coding Genes ◽

A Genome ◽

Inachis Io

We present a genome assembly from an individual male Aglais io (also known as Inachis io and Nymphalis io) (the European peacock; Arthropoda; Insecta; Lepidoptera; Nymphalidae). The genome sequence is 384 megabases in span. The majority (99.91%) of the assembly is scaffolded into 31 chromosomal pseudomolecules, with the Z sex chromosome assembled. Gene annotation of this assembly on Ensembl has identified 11,420 protein coding genes.

Download Full-text

A standard knockout procedure alters expression of adjacent loci at the translational level

Nucleic Acids Research ◽

10.1093/nar/gkab872 ◽

2021 ◽

Author(s):

Artyom A Egorov ◽

Alexander I Alexandrov ◽

Valery N Urakov ◽

Desislava S Makeeva ◽

Roman O Edakin ◽

...

Keyword(s):

Genetic Interaction ◽

Gene Annotation ◽

Alternative Polyadenylation ◽

Mrna Translation ◽

Regulatory Elements ◽

Ribosome Profiling ◽

Head Orientation ◽

Neighboring Gene ◽

Knockout Mutants ◽

Severe Impairment

Abstract The Saccharomyces cerevisiae gene deletion collection is widely used for functional gene annotation and genetic interaction analyses. However, the standard G418-resistance cassette used to produce knockout mutants delivers strong regulatory elements into the target genetic loci. To date, its side effects on the expression of neighboring genes have never been systematically assessed. Here, using ribosome profiling data, RT-qPCR, and reporter expression, we investigated perturbations induced by the KanMX module. Our analysis revealed significant alterations in the transcription efficiency of neighboring genes and, more importantly, severe impairment of their mRNA translation, leading to changes in protein abundance. In the ‘head-to-head’ orientation of the deleted and neighboring genes, knockout often led to a shift of the transcription start site of the latter, introducing new uAUG codon(s) into the expanded 5′ untranslated region (5′ UTR). In the ‘tail-to-tail’ arrangement, knockout led to activation of alternative polyadenylation signals in the neighboring gene, thus altering its 3′ UTR. These events may explain the so-called neighboring gene effect (NGE), i.e. false genetic interactions of the deleted genes. We estimate that in as much as ∼1/5 of knockout strains the expression of neighboring genes may be substantially (>2-fold) deregulated at the level of translation.

Download Full-text

Loss of critical developmental and human disease-causing genes in 58 mammals

10.1101/819169 ◽

2019 ◽

Author(s):

Yatish Turakhia ◽

Heidi I. Chen ◽

Amir Marcovitz ◽

Gill Bejerano

Keyword(s):

Evolutionary Biology ◽

Large Scale ◽

Gene Annotation ◽

Synonymous Substitution ◽

Specific Gene ◽

High Confidence ◽

Protein Coding ◽

Congenital Diseases ◽

Manual Curation ◽

Human Genes

Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools and protein databases focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (deletion and non-synonymous substitution) as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence protein-coding gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using the hg38 human assembly as a reference, we discovered over 500 unique human genes affected by such high-confidence erosion events in different clades across 58 mammals. While most of these events likely have benign consequences, we also found dozens of clade-specific gene losses that result in early lethality in outgroup mammals or are associated with severe congenital diseases in humans. Our discoveries yield intriguing potential for translational medical genetics and for evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.

Download Full-text

A reference translatome map reveals two modes of protein evolution

10.1101/2021.07.17.452746 ◽

2021 ◽

Author(s):

Aaron Wacholder ◽

Omer Acar ◽

Anne-Ruxandra Carvunis

Keyword(s):

Model Organism ◽

Great Majority ◽

High Sensitivity ◽

Ribosome Profiling ◽

Computational Framework ◽

Protein Coding ◽

Biologically Relevant ◽

Protein Coding Genes ◽

Representative Subset ◽

Evolutionarily Conserved

Ribosome profiling experiments demonstrate widespread translation of eukaryotic genomes outside of annotated protein-coding genes. However, it is unclear how much of this "noncanonical" translation contributes biologically relevant microproteins rather than insignificant translational noise. Here, we developed an integrative computational framework (iRibo) that leverages hundreds of ribosome profiling experiments to detect signatures of translation with high sensitivity and specificity. We deployed iRibo to construct a reference translatome in the model organism S. cerevisiae. We identified ~19,000 noncanonical translated elements outside of the ~5,400 canonical yeast protein-coding genes. Most (65%) of these non-canonical translated elements were located on transcripts annotated as non-coding, or entirely unannotated, while the remainder were located on the 5' and 3' ends of mRNA transcripts. Only 14 non-canonical translated elements were evolutionarily conserved. In stark contrast with canonical protein-coding genes, the great majority of the yeast noncanonical translatome appeared evolutionarily transient and showed no signatures of selection. Yet, we uncovered phenotypes for 53% of a representative subset of evolutionarily transient translated elements. The iRibo framework and reference translatome described here provide a foundation for further investigation of a largely unexplored, but biologically significant, evolutionarily transient translatome.

Download Full-text