scholarly journals The structure of the Drosophila melanogaster sex peptide: Identification of hydroxylated isoleucine and a strain variation in the pattern of amino acid hydroxylation

2020 ◽  
Vol 124 ◽  
pp. 103414
Author(s):  
Sebastian Sturm ◽  
Adam Dowle ◽  
Neil Audsley ◽  
R. Elwyn Isaac
2021 ◽  
Author(s):  
◽  
Samaneh Azari

<p>De novo peptide sequencing algorithms have been developed for peptide identification in proteomics from tandem mass spectra (MS/MS), which can be used to identify and discover novel peptides and proteins that do not have a database available. Despite improvements in MS instrumentation and de novo sequencing methods, a significant number of CID MS/MS spectra still remain unassigned with the current algorithms, often leading to low confidence of peptide assignments to the spectra. Moreover, current algorithms often fail to construct the completely matched sequences, and produce partial matches. Therefore, identification of full-length peptides remains challenging. Another major challenge is the existence of noise in MS/MS spectra which makes the data highly imbalanced. Also missing peaks, caused by incomplete MS fragmentation makes it more difficult to infer a full-length peptide sequence. In addition, the large search space of all possible amino acid sequences for each spectrum leads to a high false discovery rate. This thesis focuses on improving the performance of current methods by developing new algorithms corresponding to three steps of preprocessing, sequence optimisation and post-processing using machine learning for more comprehensive interrogation of MS/MS datasets. From the machine learning point of view, the three steps can be addressed by solving different tasks such as classification, optimisation, and symbolic regression. Since Evolutionary Algorithms (EAs), as effective global search techniques, have shown promising results in solving these problems, this thesis investigates the capability of EAs in improving the de novo peptide sequencing. In the preprocessing step, this thesis proposes an effective GP-based method for classification of signal and noise peaks in highly imbalanced MS/MS spectra with the purpose of having a positive influence on the reliability of the peptide identification. The results show that the proposed algorithm is the most stable classification method across various noise ratios, outperforming six other benchmark classification algorithms. The experimental results show a significant improvement in high confidence peptide assignments to MS/MS spectra when the data is preprocessed by the proposed GP method. Moreover, the first multi-objective GP approach for classification of peaks in MS/MS data, aiming at maximising the accuracy of the minority class (signal peaks) and the accuracy of the majority class (noise peaks) is also proposed in this thesis. The results show that the multi-objective GP method outperforms the single objective GP algorithm and a popular multi-objective approach in terms of retaining more signal peaks and removing more noise peaks. The multi-objective GP approach significantly improved the reliability of peptide identification. This thesis proposes a GA-based method to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full-length sequences. The proposed GA method benefits the GA capability of searching a large search space of potential amino acid sequences to find the most likely full-length sequence. The experimental results show that the proposed method outperforms the most commonly used de novo sequencing method at both amino acid level and peptide level. This thesis also proposes a novel method for re-scoring and re-ranking the peptide spectrum matches (PSMs) from the result of de novo peptide sequencing, aiming at minimising the false discovery rate as a post-processing approach. The proposed GP method evolves the computer programs to perform regression and classification simultaneously in order to generate an effective scoring function for finding the correct PSMs from many incorrect ones. The results show that the new GP-based PSM scoring function significantly improves the identification of full-length peptides when it is used to post-process the de novo sequencing results.</p>


1980 ◽  
Vol 187 (3) ◽  
pp. 875-883 ◽  
Author(s):  
D R Thatcher

The sequence of three alcohol dehydrogenase alleloenzymes from the fruitfly Drosophila melanogaster has been determined by the sequencing of peptides produced by trypsin, chymotrypsin, thermolysin, pepsin and Staphylococcus aureus-V8-proteinase digestion. The amino acid sequence shows no obvious homology with the published sequences of the horse liver and yeast enzymes, and secondary structure prediction suggests that the nucleotide-binding domain is located in the N-terminal half of the molecule. The amino acid substitutions between AdhN-11 (a point mutation of AdhF), AdhS and AdhUF alleloenzymes were identified. AdhN-11 alcohol dehydrogenase differed from the other two by a glycine-14-(AdhS and AdhUF)-to-aspartic acid substitution, the AdhS enzyme from AdhN-11 and AdhUF enzymes by a threonine-192-(AdhN-11 and AdhUF)-to-lysine (AdhS) substitution and the AdhUF enzyme was found to differ by an alanine-45-(AdhS and AdhN-11)-to-aspartic acid (AdhUF) charge substitution and a ‘silent’ asparagine-8-(AdhS and AdhN-11)-to-alanine (AdhUF) substitution. Detailed sequence evidence has been deposited as Supplementary Publication SUP 50107 (36 pages) at the British Library Lending Division, Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1978) 169, 5.


Genetics ◽  
1995 ◽  
Vol 141 (4) ◽  
pp. 1425-1438 ◽  
Author(s):  
P J Merriman ◽  
C D Grimes ◽  
J Ambroziak ◽  
D A Hackett ◽  
P Skinner ◽  
...  

Abstract The S elements form a diverse family of long-inverted-repeat transposons within the genome of Drosophila melanogaster. These elements vary in size and sequence, the longest consisting of 1736 bp with 234-bp inverted terminal repeats. The longest open reading frame in an intact S element could encode a 345-amino acid polypeptide. This polypeptide is homologous to the transposases of the mariner-Tc1 superfamily of transposable elements. S elements are ubiquitous in D. melanogaster populations and also appear to be present in the genomes of two sibling species; however, they seem to be absent from 17 other Drosophila species that were examined. Within D. melanogaster strains, there are, on average, 37.4 cytologically detectable S elements per diploid genome. These elements are scattered throughout the chromosomes, but several sites in both the euchromatin and beta heterochromatin are consistently occupied. The discovery of an S-element-insertion mutation and a reversion of this mutation indicates that S elements are at least occasionally mobile in the D. melanogaster genome. These elements seem to insert at an AT dinucleotide within a short palindrome and apparently duplicate that dinucleotide upon insertion.


2008 ◽  
Vol 105 (42) ◽  
pp. 16207-16211 ◽  
Author(s):  
P. S. Schmidt ◽  
C.-T. Zhu ◽  
J. Das ◽  
M. Batavia ◽  
L. Yang ◽  
...  

2004 ◽  
Vol 3 (4) ◽  
pp. 813-820 ◽  
Author(s):  
Brian D. Halligan ◽  
Edward A. Dratz ◽  
Xin Feng ◽  
Simon N. Twigger ◽  
Peter J. Tonellato ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document