scholarly journals utr.annotation: a tool for annotating genomic variants that could influence post-transcriptional regulation

Author(s):  
Yating Liu ◽  
Joseph D Dougherty

Abstract Summary Whole genome sequencing of patient populations is identifying thousands of new variants in UnTranslated Regions(UTRs). While the consequences of UTR mutations are not as easily predicted from primary sequence as coding mutations are, there are some known features of UTRs that modulate their function. utr.annotation is an R package that can be used to annotate potential deleterious variants in the UTR regions for both human and mouse species. Given a CSV or VCF format variant file, utr.annotation provides information of each variant on whether and how it alters known translational regulators including: upstream Open Reading Frames (uORFs), upstream Kozak sequences, polyA signals, Kozak sequences at the annotated translation start site, start codons, and stop codons, conservation scores in the variant position, and whether and how it changes ribosome loading based on a model derived from empirical data. Availability utr.annotation is freely available on Bitbucket (https://bitbucket.org/jdlabteam/utr.annotation/src/master/) and CRAN (https://cran.r-project.org/web/packages/utr.annotation/index.html) Supplementary information Supplementary data are available at https://wustl.box.com/s/yye99bryfin89nav45gv91l5k35fxo7z.

2021 ◽  
Author(s):  
Yating Liu ◽  
Joseph Dougherty

Whole genome sequencing of patient populations is identifying thousands of new variants in UnTranslated Regions(UTRs). While the consequences of UTR mutations are not as easily predicted from primary sequence as coding mutations are, there are some known features of UTRs modulate their function. utR.annotation is an R package that can be used to annotate potential deleterious variants in the UTR regions for both human and mouse species. Given a CSV or VCF format variant file, utR.annotation provides information of each variant on whether and how it alters known translational regulators including:upstream Open Reading Frames (uORFs), upstream Kozak sequences, polyA signals, the Kozak sequence at the annotated translation initiation site, start codon, and stop codon, conservation scores in the variant position, and whether and how it changes ribosome loading based on a model from empirical data.


Author(s):  
Xiaolei Zhang ◽  
Matthew Wakeling ◽  
James Ware ◽  
Nicola Whiffin

AbstractSummaryCurrent tools to annotate the predicted effect of genetic variants are heavily biased towards protein-coding sequence. Variants outside of these regions may have a large impact on protein expression and/or structure and can lead to disease, but this effect can be challenging to predict. Consequently, these variants are poorly annotated using standard tools. We have developed a plugin to the Ensembl Variant Effect Predictor, the UTRannotator, that annotates variants in 5’untranslated regions (5’UTR) that create or disrupt upstream open reading frames (uORFs). We investigate the utility of this tool using the ClinVar database, providing an annotation for 30.8% of all 5’UTR (likely) pathogenic variants, and highlighting 31 variants of uncertain significance as candidates for further follow-up. We will continue to update the UTR annotator as we gain new knowledge on the impact of variants in UTRs.Availability and implementationUTRannotator is freely available on Github: https://github.com/ImperialCardioGenetics/UTRannotatorSupplementary informationSupplementary data are available at bioRxiv.


Author(s):  
Xiaolei Zhang ◽  
Matthew Wakeling ◽  
James Ware ◽  
Nicola Whiffin

Abstract Summary Current tools to annotate the predicted effect of genetic variants are heavily biased towards protein-coding sequence. Variants outside of these regions may have a large impact on protein expression and/or structure and can lead to disease, but this effect can be challenging to predict. Consequently, these variants are poorly annotated using standard tools. We have developed a plugin to the Ensembl Variant Effect Predictor, the UTRannotator, that annotates variants in 5′untranslated regions (5′UTR) that create or disrupt upstream open reading frames. We investigate the utility of this tool using the ClinVar database, providing an annotation for 31.9% of all 5′UTR (likely) pathogenic variants, and highlighting 31 variants of uncertain significance as candidates for further follow-up. We will continue to update the UTRannotator as we gain new knowledge on the impact of variants in UTRs. Availability and implementation UTRannotator is freely available on Github: https://github.com/ImperialCardioGenetics/UTRannotator. Supplementary information Supplementary data are available at Bioinformatics online.


2006 ◽  
Vol 3 (2) ◽  
pp. 109-122 ◽  
Author(s):  
◽  
Christopher H. Bryant ◽  
Graham J.L. Kemp ◽  
Marija Cvijovic

Summary We have taken a first step towards learning which upstream Open Reading Frames (uORFs) regulate gene expression (i.e., which uORFs are functional) in the yeast Saccharomyces cerevisiae. We do this by integrating data from several resources and combining a bioinformatics tool, ORF Finder, with a machine learning technique, inductive logic programming (ILP). Here, we report the challenge of using ILP as part of this integrative system, in order to automatically generate a model that identifies functional uORFs. Our method makes searching for novel functional uORFs more efficient than random sampling. An attempt has been made to predict novel functional uORFs using our method. Some preliminary evidence that our model may be biologically meaningful is presented.


Biomedicines ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 911
Author(s):  
Joana Silva ◽  
Pedro Nina ◽  
Luísa Romão

ATP-binding cassette subfamily E member 1 (ABCE1) belongs to the ABC protein family of transporters; however, it does not behave as a drug transporter. Instead, ABCE1 actively participates in different stages of translation and is also associated with oncogenic functions. Ribosome profiling analysis in colorectal cancer cells has revealed a high ribosome occupancy in the human ABCE1 mRNA 5′-leader sequence, indicating the presence of translatable upstream open reading frames (uORFs). These cis-acting translational regulatory elements usually act as repressors of translation of the main coding sequence. In the present study, we dissect the regulatory function of the five AUG and five non-AUG uORFs identified in the human ABCE1 mRNA 5′-leader sequence. We show that the expression of the main coding sequence is tightly regulated by the ABCE1 AUG uORFs in colorectal cells. Our results are consistent with a model wherein uORF1 is efficiently translated, behaving as a barrier to downstream uORF translation. The few ribosomes that can bypass uORF1 (and/or uORF2) must probably initiate at the inhibitory uORF3 or uORF5 that efficiently repress translation of the main ORF. This inhibitory property is slightly overcome in conditions of endoplasmic reticulum stress. In addition, we observed that these potent translation-inhibitory AUG uORFs function equally in cancer and in non-tumorigenic colorectal cells, which is consistent with a lack of oncogenic function. In conclusion, we establish human ABCE1 as an additional example of uORF-mediated translational regulation and that this tight regulation contributes to control ABCE1 protein levels in different cell environments.


2015 ◽  
Author(s):  
David E Weinberg ◽  
Premal Shah ◽  
Stephen W Eichhorn ◽  
Jeffrey A Hussmann ◽  
Joshua B Plotkin ◽  
...  

Ribosome-footprint profiling provides genome-wide snapshots of translation, but technical challenges can confound its analysis. Here, we use improved methods to obtain ribosome-footprint profiles and mRNA abundances that more faithfully reflect gene expression in Saccharomyces cerevisiae. Our results support proposals that both the beginning of coding regions and codons matching rare tRNAs are more slowly translated. They also indicate that emergent polypeptides with as few as three basic residues within a 10-residue window tend to slow translation. With the improved mRNA measurements, the variation attributable to translational control in exponentially growing yeast was less than previously reported, and most of this variation could be predicted with a simple model that considered mRNA abundance, upstream open reading frames, cap-proximal structure and nucleotide composition, and lengths of the coding and 5′- untranslated regions. Collectively, our results reveal key features of translational control in yeast and provide a framework for executing and interpreting ribosome- profiling studies.


2021 ◽  
Author(s):  
Vasily V. Grinev ◽  
Mikalai M. Yatskou ◽  
Victor V. Skakun ◽  
Maryna K. Chepeleva ◽  
Petr V. Nazarov

AbstractMotivationModern methods of whole transcriptome sequencing accurately recover nucleotide sequences of RNA molecules present in cells and allow for determining their quantitative abundances. The coding potential of such molecules can be estimated using open reading frames (ORF) finding algorithms, implemented in a number of software packages. However, these algorithms show somewhat limited accuracy, are intended for single-molecule analysis and do not allow selecting proper ORFs in the case of long mRNAs containing multiple ORF candidates.ResultsWe developed a computational approach, corresponding machine learning model and a package, dedicated to automatic identification of the ORFs in large sets of human mRNA molecules. It is based on vectorization of nucleotide sequences into features, followed by classification using a random forest. The predictive model was validated on sets of human mRNA molecules from the NCBI RefSeq and Ensembl databases and demonstrated almost 95% accuracy in detecting true ORFs. The developed methods and pre-trained classification model were implemented in a powerful ORFhunteR computational tool that performs an automatic identification of true ORFs among large set of human mRNA molecules.Availability and implementationThe developed open-source R package ORFhunteR is available for the community at GitHub repository (https://github.com/rfctbio-bsu/ORFhunteR), from Bioconductor (https://bioconductor.org/packages/devel/bioc/html/ORFhunteR.html) and as a web application (http://orfhunter.bsu.by).


1988 ◽  
Vol 8 (12) ◽  
pp. 5439-5447
Author(s):  
P P Mueller ◽  
B M Jackson ◽  
P F Miller ◽  
A G Hinnebusch

The third and fourth AUG codons in GCN4 mRNA efficiently repress translation of the GCN4-coding sequences under normal growth conditions. The first AUG codon is approximately 30-fold less inhibitory and is required under amino acid starvation conditions to override the repressing effects of AUG codons 3 and 4. lacZ fusions constructed to functional, elongated versions of the first and fourth upstream open reading frames (URFs) were used to show that AUG codons 1 and 4 function similarly as efficient translational start sites in vivo, raising the possibility that steps following initiation distinguish the regulatory properties of URFs 1 and 4. In accord with this idea, we observed different consequences of changing the length and termination site of URF1 versus changing those of URFs 3 and 4. The latter were lengthened considerably, with little or no effect on regulation. In fact, the function of URFs 3 and 4 was partially reconstituted with a completely heterologous URF. By contrast, certain mutations that lengthen URF1 impaired its positive regulatory function nearly as much as removing its AUG codon did. The same mutations also made URF1 a much more inhibitory element when it was present alone in the mRNA leader. These results strongly suggest that URFs 1 and 4 both function in regulation as translated coding sequences. To account for the phenotypes of the URF1 mutations, we suggest the most ribosomes normally translate URF1 and that the mutations reduce the number of ribosomes that are able to complete URF1 translation and resume scanning downstream. This effect would impair URF1 positive regulatory function if ribosomes must first translate URF1 in order to overcome the strong translational block at the 3'-proximal URFs. Because URF1-lacZ fusions were translated at the same rate under repressing and derepressing conditions, it appears that modulating initiation at URF1 is not the means that is used to restrict the regulatory consequences of URF1 translation to starvation conditions.


Sign in / Sign up

Export Citation Format

Share Document