scholarly journals Chroniques génomiques

2020 ◽  
Vol 36 (6-7) ◽  
pp. 675-677
Author(s):  
Bertrand Jordan

A systematic search for non-conventional open reading frames in human DNA reveals a large number of small ORFs encoding peptides generally smaller than 100 amino-acids. These ORFs are transcribed and translated into small proteins, which are demonstrated to have functional significance by bulk CRISPR inactivation. Evidence is also found for bicistronic mRNAs including such a small ORF upstream of a canonical coding sequence. These findings add a new facet to our understanding of biological processes.

2019 ◽  
Author(s):  
Jeremy Weaver ◽  
Fuad Mohammad ◽  
Allen R. Buskirk ◽  
Gisela Storz

ABSTRACTSmall proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the true prevalence of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organism Escherichia coli using theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly-initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions in E. coli. We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. The corresponding genes are not only intergenic, but are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCEProteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the function of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.


mBio ◽  
2019 ◽  
Vol 10 (2) ◽  
Author(s):  
Jeremy Weaver ◽  
Fuad Mohammad ◽  
Allen R. Buskirk ◽  
Gisela Storz

ABSTRACTSmall proteins consisting of 50 or fewer amino acids have been identified as regulators of larger proteins in bacteria and eukaryotes. Despite the importance of these molecules, the total number of small proteins remains unknown because conventional annotation pipelines usually exclude small open reading frames (smORFs). We previously identified several dozen small proteins in the model organismEscherichia coliusing theoretical bioinformatic approaches based on sequence conservation and matches to canonical ribosome binding sites. Here, we present an empirical approach for discovering new proteins, taking advantage of recent advances in ribosome profiling in which antibiotics are used to trap newly initiated 70S ribosomes at start codons. This approach led to the identification of many novel initiation sites in intergenic regions inE. coli. We tagged 41 smORFs on the chromosome and detected protein synthesis for all but three. Not only are the corresponding genes intergenic but they are also found antisense to other genes, in operons, and overlapping other open reading frames (ORFs), some impacting the translation of larger downstream genes. These results demonstrate the utility of this method for identifying new genes, regardless of their genomic context.IMPORTANCEProteins comprised of 50 or fewer amino acids have been shown to interact with and modulate the functions of larger proteins in a range of organisms. Despite the possible importance of small proteins, the true prevalence and capabilities of these regulators remain unknown as the small size of the proteins places serious limitations on their identification, purification, and characterization. Here, we present a ribosome profiling approach with stalled initiation complexes that led to the identification of 38 new small proteins.


Biomedicines ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 911
Author(s):  
Joana Silva ◽  
Pedro Nina ◽  
Luísa Romão

ATP-binding cassette subfamily E member 1 (ABCE1) belongs to the ABC protein family of transporters; however, it does not behave as a drug transporter. Instead, ABCE1 actively participates in different stages of translation and is also associated with oncogenic functions. Ribosome profiling analysis in colorectal cancer cells has revealed a high ribosome occupancy in the human ABCE1 mRNA 5′-leader sequence, indicating the presence of translatable upstream open reading frames (uORFs). These cis-acting translational regulatory elements usually act as repressors of translation of the main coding sequence. In the present study, we dissect the regulatory function of the five AUG and five non-AUG uORFs identified in the human ABCE1 mRNA 5′-leader sequence. We show that the expression of the main coding sequence is tightly regulated by the ABCE1 AUG uORFs in colorectal cells. Our results are consistent with a model wherein uORF1 is efficiently translated, behaving as a barrier to downstream uORF translation. The few ribosomes that can bypass uORF1 (and/or uORF2) must probably initiate at the inhibitory uORF3 or uORF5 that efficiently repress translation of the main ORF. This inhibitory property is slightly overcome in conditions of endoplasmic reticulum stress. In addition, we observed that these potent translation-inhibitory AUG uORFs function equally in cancer and in non-tumorigenic colorectal cells, which is consistent with a lack of oncogenic function. In conclusion, we establish human ABCE1 as an additional example of uORF-mediated translational regulation and that this tight regulation contributes to control ABCE1 protein levels in different cell environments.


2019 ◽  
Vol 8 (43) ◽  
Author(s):  
T. O. C. Faleye ◽  
O. M. Adewumi ◽  
D. Klapsa ◽  
M. Majumdar ◽  
J. Martin ◽  
...  

Here, we describe nearly complete genome sequences (7,361 nucleotides [nt] and 6,893 nt) of two echovirus 20 (E20) isolates from Nigeria that were simultaneously typed as CVB and E20 (dual serotype) by neutralization assay. Both include two overlapping open reading frames (ORFs) of 67 and 2,183 amino acids that encoded a recently described gut infection-facilitating protein and the classic enterovirus proteins, respectively.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Fabio R. Cerqueira ◽  
Ana Tereza Ribeiro Vasconcelos

Abstract Small open reading frames (ORFs) have been systematically disregarded by automatic genome annotation. The difficulty in finding patterns in tiny sequences is the main reason that makes small ORFs to be overlooked by computational procedures. However, advances in experimental methods show that small proteins can play vital roles in cellular activities. Hence, it is urgent to make progress in the development of computational approaches to speed up the identification of potential small ORFs. In this work, our focus is on bacterial genomes. We improve a previous approach to identify small ORFs in bacteria. Our method uses machine learning techniques and decoy subject sequences to filter out spurious ORF alignments. We show that an advanced multivariate analysis can be more effective in terms of sensitivity than applying the simplistic and widely used e-value cutoff. This is particularly important in the case of small ORFs for which alignments present higher e-values than usual. Experiments with control datasets show that the machine learning algorithms used in our method to curate significant alignments can achieve average sensitivity and specificity of 97.06% and 99.61%, respectively. Therefore, an important step is provided here toward the construction of more accurate computational tools for the identification of small ORFs in bacteria.


2020 ◽  
Vol 40 (6) ◽  
Author(s):  
Corrine Corrina R. Hartford ◽  
Ashish Lal

ABSTRACT Recent advancements in genetic and proteomic technologies have revealed that more of the genome encodes proteins than originally thought possible. Specifically, some putative long noncoding RNAs (lncRNAs) have been misannotated as noncoding. Numerous lncRNAs have been found to contain short open reading frames (sORFs) which have been overlooked because of their small size. Many of these sORFs encode small proteins or micropeptides with fundamental biological importance. These micropeptides can aid in diverse processes, including cell division, transcription regulation, and cell signaling. Here we discuss strategies for establishing the coding potential of putative lncRNAs and describe various functions of known micropeptides.


2009 ◽  
Vol 53 (5) ◽  
pp. 1907-1911 ◽  
Author(s):  
Esther Izquierdo ◽  
Yimin Cai ◽  
Eric Marchioni ◽  
Saïd Ennahar

ABSTRACT Enterococcus faecium IT62, a strain isolated from ryegrass in Japan, produces three bacteriocins (enterocins L50A, L50B, and IT) that have been previously purified and the primary structures of which have been determined by amino acid sequencing (E. Izquierdo, A. Bednarczyk, C. Schaeffer, Y. Cai, E. Marchioni, A. Van Dorsselaer, and S. Ennahar, Antimicrob. Agents Chemother., 52:1917-1923, 2008). Genetic analysis showed that the bacteriocins of E. faecium IT62 are plasmid encoded, but with the structural genes specifying enterocin L50A and enterocin L50B being carried by a plasmid (pTAB1) that is separate from the one (pTIT1) carrying the structural gene of enterocin IT. Sequencing analysis of a 1,475-bp region from pTAB1 identified two consecutive open reading frames corresponding, with the exception of 2 bp, to the genes entL50A and entL50B, encoding EntL50A and EntL50B, respectively. Both bacteriocins are synthesized without N-terminal leader sequences. Genetic analysis of a sequenced 1,380-bp pTIT1 fragment showed that the genes entIT and entIM, encoding enterocin IT and its immunity protein, respectively, were both found in E. faecium VRE200 for bacteriocin 32. Enterocin IT, a 6,390-Da peptide made up of 54 amino acids, has been previously shown to be identical to the C-terminal part of bacteriocin 32, a 7,998-Da bacteriocin produced by E. faecium VRE200 whose structure was deduced from its structural gene (T. Inoue, H. Tomita, and Y. Ike, Antimicrob. Agents Chemother., 50:1202-1212, 2006). By combining the biochemical and genetic data on enterocin IT, it was concluded that bacteriocin 32 is in fact identical to enterocin IT, both being encoded by the same plasmid-borne gene, and that the N-terminal leader peptide for this bacteriocin is 35 amino acids long and not 19 amino acids long as previously reported.


2017 ◽  
Vol 5 (16) ◽  
Author(s):  
Adriana N. Souza ◽  
Fábio N. Silva ◽  
Claudine M. Carvalho

ABSTRACT A novel satellite virus of 1,228 bp in length was found in a single cassava plant. Bioinformatic analyses show that it has two open reading frames (ORFs) in its genome, probably encoding a coat protein of 156 and a putative protein of 90 amino acids.


1991 ◽  
Vol 11 (9) ◽  
pp. 4306-4313 ◽  
Author(s):  
B A Arrick ◽  
A L Lee ◽  
R L Grendell ◽  
R Derynck

We have cloned and sequenced the 5' untranslated region of the transforming growth factor-beta 3 (TGF-beta 3) mRNA as well as the adjacent genomic sequence. S1 nuclease analysis identified a single transcription start site. We have thus determined that the 5' untranslated region is about 1.1 kb long and contains 11 open reading frames. In vitro translation of the TGF-beta 3 precursor coding sequence was markedly inhibited by the presence of the 5' untranslated region. Similarly, when the 5' untranslated region of TGF-beta 3 was introduced upstream of the coding sequence of chloramphenicol acetyltransferase, in vitro translation was inhibited. Furthermore, upon transfection into 293 cells, chloramphenicol acetyltransferase expression was inhibited by the 5' untranslated region of TGF-beta 3. The degree of translational inhibition was inversely proportional to the amount of transfected DNA. Mutation analysis implicated multiple segments of the 5' untranslated region as contributing to the inhibitory effect. Deletion of much of the 5'-most 640 nucleotides, including 8 of the 11 upstream ATGs, relieved much but not all of the inhibitory influence of the 5' untranslated region of TGF-beta 3 mRNA. The two upstream open reading frames closest to the initiator codon for the TGF-beta 3 coding sequence also decreased translational efficiency, since mutation of either ATG resulted in increased translation. Transfection results with T47-D cells, a cell line which expresses TGF-beta 3 mRNA, were similar to those obtained with the 293 cell line. Thus, TGF-beta 3 mRNA is a recent example of an expanding group of growth-related mRNAs in which the 5' untranslated region contains upstream open reading frames and other sequences which inhibit translation.


2002 ◽  
Vol 184 (1) ◽  
pp. 216-223 ◽  
Author(s):  
Markus Göbel ◽  
Kerstin Kassel-Cati ◽  
Eberhard Schmidt ◽  
Walter Reineke

ABSTRACT 3-Oxoadipate:succinyl-coenzyme A (CoA) transferase and 3-oxoadipyl-CoA thiolase carry out the ultimate steps in the conversion of benzoate and 3-chlorobenzoate to tricarboxylic acid cycle intermediates in bacteria utilizing the 3-oxoadipate pathway. This report describes the characterization of DNA fragments with the overall length of 5.9 kb from Pseudomonas sp. strain B13 that encode these enzymes. DNA sequence analysis revealed five open reading frames (ORFs) plus an incomplete one. ORF1, of unknown function, has a length of 414 bp. ORF2 (catI) encodes a polypeptide of 282 amino acids and starts at nucleotide 813. ORF3 (catJ) encodes a polypeptide of 260 amino acids and begins at nucleotide 1661. CatI and CatJ are the subunits of the 3-oxoadipate:succinyl-CoA transferase, whose activity was demonstrated when both genes were ligated into expression vector pET11a. ORF4, termed catF, codes for a protein of 401 amino acid residues with a predicted mass of 41,678 Da with 3-oxoadipyl-CoA thiolase activity. The last three ORFs seem to form an operon since they are oriented in the same direction and showed an overlapping of 1 bp between catI and catJ and of 4 bp between catJ and catF. Conserved functional groups important for the catalytic activity of CoA transferases and thiolases were identified in CatI, CatJ, and CatF. ORF5 (catD) encodes the 3-oxoadipate enol-lactone hydrolase. An incomplete ORF6 of 1,183 bp downstream of ORF5 and oriented in the opposite direction was found. The protein sequence deduced from ORF6 showed a putative AMP-binding domain signature.


Sign in / Sign up

Export Citation Format

Share Document