[34] GATEWAY recombinational cloning: Application to the cloning of large numbers of open reading frames or ORFeomes

Author(s):  
Albertha J.M. Walhout ◽  
Gary F. Temple ◽  
Michael A. Brasch ◽  
James L. Hartley ◽  
Monique A. Lorson ◽  
...  
2021 ◽  
Author(s):  
John Anders ◽  
Hannes Petruschke ◽  
Nico Jehmlich ◽  
Sven-Bastiaan Haange ◽  
Martin von Bergen ◽  
...  

Abstract Background: Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results: We observe that number and quality of the Peptide-to-Spectra-Matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that previously have been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence in proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions: The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular also capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration with transcriptomics data and other source of genome-level information.


2020 ◽  
Author(s):  
Philip P. Adams ◽  
Gabriele Baniulyte ◽  
Caroline Esnault ◽  
Kavya Chegireddy ◽  
Navjot Singh ◽  
...  

AbstractMany bacterial genes are regulated by RNA elements in their 5′ untranslated regions (UTRs). However, the full complement of these elements is not known even in the model bacterium Escherichia coli. Using complementary RNA-sequencing approaches, we detected large numbers of 3′ ends in 5′ UTRs and open reading frames (ORFs), suggesting extensive regulation by premature transcription termination. We documented regulation for multiple transcripts, including spermidine induction involving Rho and translation of an upstream ORF for an mRNA encoding a spermidine efflux pump. In addition to discovering novel sites of regulation, we detected short, stable RNA fragments derived from 5′ UTRs and sequences internal to ORFs. Characterization of three of these transcripts, including an RNA internal to an essential cell division gene, revealed that they have independent functions as sRNA sponges. Thus, these data uncover an abundance of cis- and trans-acting RNA regulators in bacterial 5′ UTRs and internal to ORFs.


2007 ◽  
Vol 15 (3) ◽  
pp. 433-438 ◽  
Author(s):  
Imran H. Khan ◽  
Resmi Ravindran ◽  
JoAnn Yee ◽  
Melanie Ziman ◽  
David M. Lewinsohn ◽  
...  

ABSTRACT Tuberculosis (TB) is a serious global disease. The fatality rate attributed to TB is among the highest of infectious diseases, with approximately 2 million deaths occurring per year worldwide. Identification of individuals infected with Mycobacterium tuberculosis and screening of their immediate contacts is crucial for controlling the spread of TB. Current methods for detection of M. tuberculosis infection are not efficient, in particular, for testing large numbers of samples. We report a novel and efficient multiplex microbead immunoassay (MMIA), based on Luminex technology, for profiling antibodies to M. tuberculosis. Microbead sets identifiable by unique fluorescence were individually coated with each of several M. tuberculosis antigens and tested in multiplex format for antibody detection in the experimental nonhuman primate model of TB. Certain M. tuberculosis antigens, e.g., ESAT-6, CFP-10, and HspX, were included to enhance the specificity of the MMIA, because these antigens are absent in nontuberculous mycobacteria and the vaccine strain Mycobacterium bovis bacillus Calmette-Guérin. The MMIA enabled simultaneous detection of multiple M. tuberculosis plasma antibodies in several cohorts of macaques representing different stages of infection and/or disease. Antibody profiles were defined in early and latent/chronic infection. These proof-of-concept findings demonstrate the potential clinical use of the MMIA. In addition, the MMIA serodetection system has a potential for mining M. tuberculosis open reading frames (about 4,000) to discover novel target proteins for the development of more-comprehensive TB serodiagnostic tests.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
John Anders ◽  
Hannes Petruschke ◽  
Nico Jehmlich ◽  
Sven-Bastiaan Haange ◽  
Martin von Bergen ◽  
...  

Abstract Background Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results We observe that number and quality of the peptide-spectrum matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that have previously been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence at the proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration of transcriptomics data and other sources of genome-level information.


Archaea ◽  
2006 ◽  
Vol 2 (1) ◽  
pp. 11-30 ◽  
Author(s):  
Mark K. Ashby

The publicly available annotated archaeal genome sequences (23 complete and three partial annotations, October 2005) were searched for the presence of potential two-component open reading frames (ORFs) using gene category lists and BLASTP. A total of 489 potential two-component genes were identified from the gene category lists and BLASTP. Two-component genes were found in 14 of the 21 Euryarchaeal sequences (October 2005) and in neither the Crenarchaeota nor the Nanoarchaeota. A total of 20 predicted protein domains were identified in the putative two-component ORFs that, in addition to the histidine kinase and receiver domains, also includes sensor and signalling domains. The detailed structure of these putative proteins is shown, as is the distribution of each class of two-component genes in each species. Potential members of orthologous groups have been identified, as have any potential operons containing two or more two-component genes. The number of two-component genes in those Euryarchaeal species which have them seems to be linked more to lifestyle and habitat than to genome complexity, with most examples being found inMethanospirillum hungatei,Haloarcula marismortui,Methanococcoides burtoniiand the mesophilic Methanosarcinales group. The large numbers of two-component genes in these species may reflect a greater requirement for internal regulation. Phylogenetic analysis of orthologous groups of five different protein classes, three probably involved in regulating taxis, suggests that most of these ORFs have been inherited vertically from an ancestral Euryarchaeal species and point to a limited number of key horizontal gene transfer events.


Sign in / Sign up

Export Citation Format

Share Document