SPIDER: software for protein identification from sequence tags with de novo sequencing error

Author(s):  
Yonghua Han ◽  
Bin Ma ◽  
Kaizhong Zhang
2005 ◽  
Vol 03 (03) ◽  
pp. 697-716 ◽  
Author(s):  
YONGHUA HAN ◽  
BIN MA ◽  
KAIZHONG ZHANG

For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.


2021 ◽  
Author(s):  
◽  
Cassidy Moeke

<p>The greenshell mussel Perna canaliculus is considered to be a suitable biomonitor for heavy metal pollution. This is due to their ability to accumulate and tolerate heavy metals in their tissues. These characteristics make them useful for identifying protein biomarkers of heavy metal pollution, as well as proteins associated with heavy metal detoxification and homeostasis. However, the identification of such proteins is restricted by the greenshell mussel being poorly represented in sequence databases. Several strategies have previously been used to identify proteins in unsequenced species, but only one of these strategies has been applied to the greenshell mussel. The objective of this thesis was to examine different protein identification strategies using a combined two-dimensional gel electrophoresis and MALDI-TOF/TOF mass spectrometry approach. The protein identification strategies used include a Mascot database search, as well as de novo sequencing approaches using PEAKS DB and SPIDER homology searches. In total, 155 protein spots were excised and a total of 68 identified. Fifty-six proteins were identified using a Mascot search against the Mollusca, NCBInr and Invertebrate EST database, with seven single-peptide identifications. De novo sequencing strategies identified additional proteins, with two from a PEAKS DB search and 10 from an error-tolerant SPIDER homology search. The most noticeable protein groups identified were cytoskeletal proteins, stress response proteins and those involved in protein biosynthesis. Actin and tubulin made up the bulk of the identifications, accounting for 39% of all proteins identified. This multifaceted approach was shown to be useful for identifying proteins in the greenshell mussel Perna canaliculus. Mascot and PEAKS DB performed equally well, while the error-tolerant functionality of SPIDER was useful for identifying additional proteins. A subsequent search against the Invertebrate EST database was also found to be useful for identifying additional proteins. Despite this, more than half of all proteins remained unidentified. Most of these proteins either failed to produce good quality MS spectra or did not find a match to a sequence in the database. Future research should first focus on obtaining quality MS spectra for all proteins concerned and then examine other strategies that may be more suitable for identifying proteins for species with poor representation in sequence databases.</p>


2005 ◽  
Vol 4 (1) ◽  
pp. 83-90 ◽  
Author(s):  
Wenqing Shui ◽  
Yinkun Liu ◽  
Huizhi Fan ◽  
Huimin Bao ◽  
Shufang Liang ◽  
...  

2018 ◽  
Vol 15 (2) ◽  
pp. 259-265 ◽  
Author(s):  
Nguyễn Tiến Dũng ◽  
Đỗ Thị Vân Anh ◽  
Nguyễn Thị Minh Phương ◽  
Bùi Thị Huyền ◽  
Phạm Đình Minh ◽  
...  

Wasp venoms are complex mixtures of various types of compounds, of which proteins and peptides are major components. Beside its toxicity, wasp venom is potential for treatment of diseases. Characterization of venom proteins and peptides is the first and most important step toward its applications in medicine. Vietnam possesses many valuable materials, of which venoms could be used in medicine. In the present work, we aim to identify proteins and peptides in the venom of Vespa velutina (V. velutina), a species of social wasp indigenous to Southeast Asia including Vietnam using proteomic techniques. The venom isolated from V. velutina by manual extraction was digested with trypsin via the FASP (Filter Aided Sample Preparation) method and analyzed with liquid chromatography tandem - mass spectrometry (LC-MS/MS). The following protein identification, protein validation, and peptide de novo sequencing were carried out using the Peaks software. In total, we detected 36 proteins from V. velutina venom and many of them had been reported as venom-specific proteins. According to Gene Ontology Annotation (GOA), V. velutina venom proteins were functionally classified into five categories: binding proteins (53%), catalytic proteins (33%), structural proteins (8%), antioxidants (4%), and proteins with other functions (2%). In addition, 81 peptides were detected in the venom of V. velutina by de novo sequencing, of which 34 peptides (42%) are potential venom peptides. We introduced for the first time the collection of proteins and peptides from V. velutina venom, providing the basis for its further application in medicine.


2002 ◽  
Vol 74 (22) ◽  
pp. 5774-5785 ◽  
Author(s):  
Sheng Gu ◽  
Songqin Pan ◽  
E. Morton Bradbury ◽  
Xian Chen

2021 ◽  
Author(s):  
◽  
Cassidy Moeke

<p>The greenshell mussel Perna canaliculus is considered to be a suitable biomonitor for heavy metal pollution. This is due to their ability to accumulate and tolerate heavy metals in their tissues. These characteristics make them useful for identifying protein biomarkers of heavy metal pollution, as well as proteins associated with heavy metal detoxification and homeostasis. However, the identification of such proteins is restricted by the greenshell mussel being poorly represented in sequence databases. Several strategies have previously been used to identify proteins in unsequenced species, but only one of these strategies has been applied to the greenshell mussel. The objective of this thesis was to examine different protein identification strategies using a combined two-dimensional gel electrophoresis and MALDI-TOF/TOF mass spectrometry approach. The protein identification strategies used include a Mascot database search, as well as de novo sequencing approaches using PEAKS DB and SPIDER homology searches. In total, 155 protein spots were excised and a total of 68 identified. Fifty-six proteins were identified using a Mascot search against the Mollusca, NCBInr and Invertebrate EST database, with seven single-peptide identifications. De novo sequencing strategies identified additional proteins, with two from a PEAKS DB search and 10 from an error-tolerant SPIDER homology search. The most noticeable protein groups identified were cytoskeletal proteins, stress response proteins and those involved in protein biosynthesis. Actin and tubulin made up the bulk of the identifications, accounting for 39% of all proteins identified. This multifaceted approach was shown to be useful for identifying proteins in the greenshell mussel Perna canaliculus. Mascot and PEAKS DB performed equally well, while the error-tolerant functionality of SPIDER was useful for identifying additional proteins. A subsequent search against the Invertebrate EST database was also found to be useful for identifying additional proteins. Despite this, more than half of all proteins remained unidentified. Most of these proteins either failed to produce good quality MS spectra or did not find a match to a sequence in the database. Future research should first focus on obtaining quality MS spectra for all proteins concerned and then examine other strategies that may be more suitable for identifying proteins for species with poor representation in sequence databases.</p>


2005 ◽  
Vol 16 (03) ◽  
pp. 487-497
Author(s):  
YONGHUA HAN ◽  
BIN MA ◽  
KAIZHONG ZHANG

In Biochemistry, tandem mass spectrometry (MS/MS) is the most common method for peptide and protein identifications. One computational method to get a peptide sequence from the MS/MS data is called de novo sequencing, which is becoming more and more important in this area. However De novo sequencing usually can only confidently determine partial sequences, while the undetermined parts are represented by "mass gaps". We call such a partially determined sequence a gapped sequence tag. When a gapped sequence tag is searched in a database for protein identification, the determined parts should match the database sequence exactly, while each mass gap should match a substring of amino acids whose masses add up to the value of the mass gap. In such a case, the standard string matching algorithm does not work any more. In this paper, we present a new efficient algorithm to find the matches of gapped sequence tags in a protein database.


Sign in / Sign up

Export Citation Format

Share Document