scholarly journals Proteogenomic Analysis of Polymorphisms and Gene Annotation Divergences in Prokaryotes using a Clustered Mass Spectrometry-Friendly Database

2010 ◽  
Vol 10 (1) ◽  
pp. M110.002527 ◽  
Author(s):  
Gustavo A. de Souza ◽  
Magnus Ø. Arntzen ◽  
Suereta Fortuin ◽  
Anita C. Schürch ◽  
Hiwa Målen ◽  
...  
2020 ◽  
Author(s):  
Michal Levin ◽  
Marion Scheibe ◽  
Falk Butter

Abstract BackgroundThe process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-Seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. ResultsCombining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. ConclusionsWe show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.


2007 ◽  
Vol 17 (2) ◽  
pp. 231-239 ◽  
Author(s):  
S. Tanner ◽  
Z. Shen ◽  
J. Ng ◽  
L. Florea ◽  
R. Guigo ◽  
...  

Cells ◽  
2019 ◽  
Vol 8 (7) ◽  
pp. 744 ◽  
Author(s):  
Xiaolan Yu ◽  
Yongsheng Wang ◽  
Markus V. Kohnen ◽  
Mingxin Piao ◽  
Min Tu ◽  
...  

Moso bamboo is an important forest species with a variety of ecological, economic, and cultural values. However, the gene annotation information of moso bamboo is only based on the transcriptome sequencing, lacking the evidence of proteome. The lignification and fiber in moso bamboo leads to a difficulty in the extraction of protein using conventional methods, which seriously hinders research on the proteomics of moso bamboo. The purpose of this study is to establish efficient methods for extracting the total proteins from moso bamboo for following mass spectrometry-based quantitative proteome identification. Here, we have successfully established a set of efficient methods for extracting total proteins of moso bamboo followed by mass spectrometry-based label-free quantitative proteome identification, which further improved the protein annotation of moso bamboo genes. In this study, 10,376 predicted coding genes were confirmed by quantitative proteomics, accounting for 35.8% of all annotated protein-coding genes. Proteome analysis also revealed the protein-coding potential of 1015 predicted long noncoding RNA (lncRNA), accounting for 51.03% of annotated lncRNAs. Thus, mass spectrometry-based proteomics provides a reliable method for gene annotation. Especially, quantitative proteomics revealed the translation patterns of proteins in moso bamboo. In addition, the 3284 transcript isoforms from 2663 genes identified by Pacific BioSciences (PacBio) single-molecule real-time long-read isoform sequencing (Iso-Seq) was confirmed on the protein level by mass spectrometry. Furthermore, domain analysis of mass spectrometry-identified proteins encoded in the same genomic locus revealed variations in domain composition pointing towards a functional diversification of protein isoform. Finally, we found that part transcripts targeted by nonsense-mediated mRNA decay (NMD) could also be translated into proteins. In summary, proteomic analysis in this study improves the proteomics-assisted genome annotation of moso bamboo and is valuable to the large-scale research of functional genomics in moso bamboo. In summary, this study provided a theoretical basis and technical support for directional gene function analysis at the proteomics level in moso bamboo.


2020 ◽  
Vol 48 (14) ◽  
pp. 7864-7882 ◽  
Author(s):  
Tristan Cardon ◽  
Julien Franck ◽  
Etienne Coyaud ◽  
Estelle M N Laurent ◽  
Marina Damato ◽  
...  

Abstract It has been recently shown that many proteins are lacking from reference databases used in mass spectrometry analysis, due to their translation templated on alternative open reading frames. This questions our current understanding of gene annotation and drastically expands the theoretical proteome complexity. The functions of these alternative proteins (AltProts) still remain largely unknown. We have developed a large-scale and unsupervised approach based on cross-linking mass spectrometry (XL-MS) followed by shotgun proteomics to gather information on the functional role of AltProts by mapping them back into known signalling pathways through the identification of their reference protein (RefProt) interactors. We have identified and profiled AltProts in a cancer cell reprogramming system: NCH82 human glioma cells after 0, 16, 24 and 48 h Forskolin stimulation. Forskolin is a protein kinase A activator inducing cell differentiation and epithelial–mesenchymal transition. Our data show that AltMAP2, AltTRNAU1AP and AltEPHA5 interactions with tropomyosin 4 are downregulated under Forskolin treatment. In a wider perspective, Gene Ontology and pathway enrichment analysis (STRING) revealed that RefProts associated with AltProts are enriched in cellular mobility and transfer RNA regulation. This study strongly suggests novel roles of AltProts in multiple essential cellular functions and supports the importance of considering them in future biological studies.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Michal Levin ◽  
Marion Scheibe ◽  
Falk Butter

Abstract Background The process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. Results Combining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. Conclusions We show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.


PROTEOMICS ◽  
2007 ◽  
Vol 7 (22) ◽  
pp. 4053-4065 ◽  
Author(s):  
Yoko Ishino ◽  
Hitomi Okada ◽  
Masahiko Ikeuchi ◽  
Hisaaki Taniguchi

2020 ◽  
Author(s):  
Michal Levin ◽  
Marion Scheibe ◽  
Falk Butter

Abstract Background The process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. Results Combining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6,200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. Conclusions We show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible.


2018 ◽  
Author(s):  
K.C.T. Machado ◽  
S. Fortuin ◽  
G.G. Tomazella ◽  
A.F. Fonseca ◽  
R. Warren ◽  
...  

AbstractIn proteomics, peptide information within mass spectrometry data from a specific organism sample is routinely challenged against a protein sequence database that best represent such organism. However, if the species/strain in the sample is unknown or poorly genetically characterized, it becomes challenging to determine a database which can represent such sample. Building customized protein sequence databases merging multiple strains for a given species has become a strategy to overcome such restrictions. However, as more genetic information is publicly available and interesting genetic features such as the existence of pan- and core genes within a species are revealed, we questioned how efficient such merging strategies are to report relevant information. To test this assumption, we constructed databases containing conserved and unique sequences for ten different species. Features that are relevant for probabilistic-based protein identification by proteomics were then monitored. As expected, increase in database complexity correlates with pangenomic complexity. However, Mycobacterium tuberculosis and Bortedella pertusis generated very complex databases even having low pangenomic complexity or no pangenome at all. This suggests that discrepancies in gene annotation is higher than average between strains of those species. We further tested database performance by using mass spectrometry data from eight clinical strains from Mycobacterium tuberculosis, and from two published datasets from Staphylococcus aureus. We show that by using an approach where database size is controlled by removing repeated identical tryptic sequences across strains/species, computational time can be reduced drastically as database complexity increases.


BMC Genomics ◽  
2008 ◽  
Vol 9 (1) ◽  
pp. 316 ◽  
Author(s):  
Gustavo A de Souza ◽  
Hiwa Målen ◽  
Tina Søfteland ◽  
Gisle Sælensminde ◽  
Swati Prasad ◽  
...  

2018 ◽  
Author(s):  
Guadalupe Gómez-Baena ◽  
Stuart D. Armstrong ◽  
Josiah O. Halstead ◽  
Mark Prescott ◽  
Sarah A. Roberts ◽  
...  

ABSTRACTMajor urinary proteins (MUP) are the major component of the urinary protein fraction in house mice (Mus spp.) and rats (Rattus spp.). The structure, polymorphism and functions of these lipocalins have been well described in the western European house mouse (Mus musculus domesticus), clarifying their role in semiochemical communication. The complexity of these roles in the mouse raises the question of similar functions in other rodents, including the Norway rat, Rattus norvegicus. Norway rats express MUPs in urine but information about specific MUP isoform sequences and functions is limited. In this study, we present a detailed molecular characterization of the MUP proteoforms expressed in the urine of two laboratory strains, Wistar Han and Brown Norway, and wild caught animals, using a combination of manual gene annotation, intact protein mass spectrometry and bottom-up mass spectrometry-based proteomic approaches. Detailed sequencing of the proteins reveals a less complex pattern of primary sequence polymorphism than the mouse. However, unlike the mouse, rat MUPs exhibit added complexity in the form of post-translational modifications including phosphorylation and exoproteolytic trimming of specific isoforms. The possibility that urinary MUPs may have different roles in rat chemical communication than those they play in the house mouse is also discussed.


Sign in / Sign up

Export Citation Format

Share Document