sequence database search
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 1)

H-INDEX

5
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Genet Abay Shiferaw ◽  
Ralf Gabriels ◽  
Elien Vandermarliere ◽  
Lennart Martens ◽  
Pieter-Jan Volders

Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we therefore investigate the effects of integrating the machine learning-based post-processor Percolator into our spectral library searching tool COSS. To evaluate the effects of this post-processing, we have used twenty data sets from two different projects and have matched these against the NIST spectral library. The matching is carried out using two performant spectral library search engines (COSS and MsPepSearch), both with and without Percolator post-processing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step was particularly effective for COSS, resulting in a substantial improvement in sensitivity and specificity of the identifications. Importantly, the false discovery rate was especially strongly affected, resulting in much more reliable results. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS .



2020 ◽  
Vol 11 ◽  
Author(s):  
Marek Schwarz ◽  
Jiří Vohradský ◽  
Martin Modrák ◽  
Josef Pánek


Author(s):  
David P. Cavanaugh ◽  
Krishnan K. Chittur

ABSTRACTMotivationSequence database search and matching algorithms are an important tool when trying to understand the structure (and so the function) of proteins. Proteins with similar structure and function often have very similar primary structure. There are however many cases where proteins with similar structure have very different primary structures. Substitution matrices (PAM, BLOSUM, Gonnett) can be used to identify proteins of similar structure, but they fail when the sequence similarity falls below about 25%.ResultsWe have described a new algorithm for examining the the primary structure of proteins against a database of known proteins with a new hydrophobicity index. In this paper, we examine the ability of TMATCH to identify proteins of similar structure using sequence matching with the hydrophobicity index. We compare results from TMATCH with those obtained using FASTA and PSI-BLAST. We show that by using similarity patterns spread across the entire length of two proteins we get a more robust indicator of remote relatedness than relying upon high similarity scoring pair regions.AvailabilityThe program TMATCH is available on [email protected]



2019 ◽  
Vol 20 (23) ◽  
pp. 5932 ◽  
Author(s):  
Yusuke Kawashima ◽  
Eiichiro Watanabe ◽  
Taichi Umeyama ◽  
Daisuke Nakajima ◽  
Masahira Hattori ◽  
...  

Data-independent acquisition (DIA)-mass spectrometry (MS)-based proteomic analysis overtop the existing data-dependent acquisition (DDA)-MS-based proteomic analysis to enable deep proteome coverage and precise relative quantitative analysis in single-shot liquid chromatography (LC)-MS/MS. However, DIA-MS-based proteomic analysis has not yet been optimized in terms of system robustness and throughput, particularly for its practical applications. We established a single-shot LC-MS/MS system with an MS measurement time of 90 min for a highly sensitive and deep proteomic analysis by optimizing the conditions of DIA and nanoLC. We identified 7020 and 4068 proteins from 200 ng and 10 ng, respectively, of tryptic floating human embryonic kidney cells 293 (HEK293F) cell digest by performing the constructed LC-MS method with a protein sequence database search. The numbers of identified proteins from 200 ng and 10 ng of tryptic HEK293F increased to 8509 and 5706, respectively, by searching the chromatogram library created by gas-phase fractionated DIA. Moreover, DIA protein quantification was highly reproducible, with median coefficients of variation of 4.3% in eight replicate analyses. We could demonstrate the power of this system by applying the proteomic analysis to detect subtle changes in protein profiles between cerebrums in germ-free and specific pathogen-free mice, which successfully showed that >40 proteins were differentially produced between the cerebrums in the presence or absence of bacteria.



2019 ◽  
Vol 48 (D1) ◽  
pp. D9-D16 ◽  
Author(s):  
Eric W Sayers ◽  
Jeff Beck ◽  
J Rodney Brister ◽  
Evan E Bolton ◽  
Kathi Canese ◽  
...  

Abstract The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.



2019 ◽  
Author(s):  
David D Shteynberg ◽  
Eric W Deutsch ◽  
David S Campbell ◽  
Michael R Hoopmann ◽  
Ulrike Kusebauch ◽  
...  

Spectral matching sequence database search engines commonly used on mass spectrometry-based proteomics experiments excel at identifying peptide sequence ions, and in addition, possible sequence ions carrying post-translational modifications (PTMs), but most do not provide confidence metrics for the exact localization of those PTMs when several possible sites are available. Localization is absolutely required for downstream molecular cell biology analysis of PTM function in vitro and in vivo. Therefore, we developed PTMProphet, a free and open-source software tool integrated into the Trans-Proteomic Pipeline, which reanalyzes identified spectra from any search engine for which pepXML output is available to provide localization confidence to enable appropriate further characterization of biologic events. Localization of any type of mass modification (e.g., phosphorylation) is supported. PTMProphet applies Bayesian mixture models to compute probabilities for each site/peptide spectrum match where a PTM has been identified. These probabilities can be combined to compute a global false localization rate at any threshold to guide downstream analysis. We describe the PTMProphet tool, its underlying algorithms and demonstrate its performance on ground-truth synthetic peptide reference datasets, one previously published small dataset, one new larger dataset, and also on a previously published phospho-enriched dataset where the correct sites of modification are unknown. Data have been deposited to ProteomeXchange with identifier PXD013210.



2019 ◽  
Author(s):  
Genet Abay Shiferaw ◽  
Elien Vandermarliere ◽  
Niels Hulstaert ◽  
Ralf Gabriels ◽  
Lennart Martens ◽  
...  

ABSTRACTSpectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique, and different spectrum similarity search tools have therefore been developed. Each of these tools, however, comes with some limitations, mainly due to low processing speed and issues with handling large databases. Furthermore, the number of spectral data formats supported is typically limited, which also creates a threshold to adoption. We have therefore developed COSS (CompOmics Spectral Searching), a new and user-friendly spectral library search tool supporting two scoring functions. COSS also includes decoy spectra generation for result validation. We have benchmarked COSS on three different spectral libraries and compared the results with established spectral search and sequence database search tool. Our comparison showed that COSS more reliably identifies spectra and is faster than other spectral library searching tools. COSS binaries and source code can be freely downloaded from https://github.com/compomics/COSS.



2018 ◽  
Vol 18 (2) ◽  
pp. 652-663 ◽  
Author(s):  
Kristian E. Swearingen ◽  
Jimmy K. Eng ◽  
David Shteynberg ◽  
Vladimir Vigdorovich ◽  
Timothy A. Springer ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document