scholarly journals Conserved Peptides Recognition by Ensemble of Neural Networks for Mining Protein Data – LPMO Case Study

Author(s):  
G.S. Dotsenko ◽  
A.S. Dotsenko

Mining protein data is a recent promising area of modern bioinformatics. In this work, we suggested a novel approach for mining protein data – conserved peptides recognition by ensemble of neural networks (CPRENN). This approach was applied for mining lytic polysaccharide monooxygenases (LPMOs) in 19 ascomycete, 18 basidiomycete, and 18 bacterial proteomes. LPMOs are recently discovered enzymes and their mining is of high relevance for biotechnology of lignocellulosic materials. CPRENN was compared with two conventional bioinformatic methods for mining protein data – profile hidden Markov models (HMMs) search (HMMER program) and peptide pattern recognition (PPR program combined with Hotpep application). The maximum number of hypothetical LPMO amino acid sequences was discovered by HMMER. Profile HMMs search proved to be more sensitive method for mining LPMOs than conserved peptides recognition. Totally, CPRENN found 76 %, 67 %, and 65 % of hypothetical ascomycete, basidiomycete, and bacterial LPMOs discovered by HMMER, respectively. For AA9, AA10, and AA11 families which contain the major part of all LPMOs in the carbohydrate-active enzymes database (CAZy), CPRENN and PPR + Hotpep found 69–98 % and 62–95 % of amino acid sequences discovered by HMMER, respectively. In contrast with PPR + Hotpep, CPRENN possessed perfect precision and provided more complete mining of basidiomycete and bacterial LPMOs.

2019 ◽  
Author(s):  
Ranjani Murali ◽  
James Hemp ◽  
Victoria Orphan ◽  
Yonatan Bisk

AbstractThe ability to correctly predict the functional role of proteins from their amino acid sequences would significantly advance biological studies at the molecular level by improving our ability to understand the biochemical capability of biological organisms from their genomic sequence. Existing methods that are geared towards protein function prediction or annotation mostly use alignment-based approaches and probabilistic models such as Hidden-Markov Models. In this work we introduce a deep learning architecture (FunctionIdentification withNeuralDescriptions orFIND) which performs protein annotation from primary sequence. The accuracy of our methods matches state of the art techniques, such as protein classifiers based on Hidden Markov Models. Further, our approach allows for model introspection via a neural attention mechanism, which weights parts of the amino acid sequence proportionally to their relevance for functional assignment. In this way, the attention weights automatically uncover structurally and functionally relevant features of the classified protein and find novel functional motifs in previously uncharacterized proteins. While this model is applicable to any database of proteins, we chose to apply this model to superfamilies of homologous proteins, with the aim of extracting features inherent to divergent protein families within a larger superfamily. This provided insight into the functional diversification of an enzyme superfamily and its adaptation to different physiological contexts. We tested our approach on three families (nitrogenases, cytochromebd-type oxygen reductases and heme-copper oxygen reductases) and present a detailed analysis of the sequence characteristics identified in previously characterized proteins in the heme-copper oxygen reductase (HCO) superfamily. These are correlated with their catalytic relevance and evolutionary history. FIND was then applied to discover features in previously uncharacterized members of the HCO superfamily, providing insight into their unique sequence features. This modeling approach demonstrates the power of neural networks to recognize patterns in large datasets and can be utilized to discover biochemically and structurally important features in proteins from their amino acid sequences.Author summary


2004 ◽  
Vol 43 (01) ◽  
pp. 102-105 ◽  
Author(s):  
S. Cerutti ◽  
L. Pattini

Summary Objectives: A wavelet based approach for the hydrophobicity analysis of protein primary structures is proposed to predict the presence of alpha helices in the secondary structure. Methods: The information about hydropathy profile periodicity content together with a score of probability of occurrence of a single amino acid allows the localization of alpha helices. Results: The accuracy is comparable to other consolidated predictors based on different techniques (i.e.: neural networks, hidden markov models). Conclusion: This method is particularly suitable to capture the amphiphilic character of the helical structures.


1993 ◽  
Vol 69 (04) ◽  
pp. 351-360 ◽  
Author(s):  
Masahiro Murakawa ◽  
Takashi Okamura ◽  
Takumi Kamura ◽  
Tsunefumi Shibuya ◽  
Mine Harada ◽  
...  

SummaryThe partial amino acid sequences of fibrinogen Aα-chains from five mammalian species have been inferred by means of the polymerase chain reaction (PCR). From the genomic DNA of the rhesus monkey, pig, dog, mouse and Syrian hamster, the DNA fragments coding for α-C domains in the Aα-chains were amplified and sequenced. In all species examined, four cysteine residues were always conserved at the homologous positions. The carboxy- and amino-terminal portions of the α-C domains showed a considerable homology among the species. However, the sizes of the middle portions, which corresponded to the internal repeat structures, showed an apparent variability because of several insertions and/or deletions. In the rhesus monkey, pig, mouse and Syrian hamster, 13 amino acid tandem repeats fundamentally similar to those in humans and the rat were identified. In the dog, however, tandem repeats were found to consist of 18 amino acids, suggesting an independent multiplication of the canine repeats. The sites of the α-chain cross-linking acceptor and α2-plasmin inhibitor cross-linking donor were not always evolutionally conserved. The arginyl-glycyl-aspartic acid (RGD) sequence was not found in the amplified region of either the rhesus monkey or the pig. In the canine α-C domain, two RGD sequences were identified at the homologous positions to both rat and human RGD S. In the Syrian hamster, a single RGD sequence was found at the same position to that of the rat. Triplication of the RGD sequences was seen in the murine fibrinogen α-C domain around the homologous site to the rat RGDS sequence. These findings are of some interest from the point of view of structure-function and evolutionary relationships in the mammalian fibrinogen Aα-chains.


1979 ◽  
Author(s):  
Takashi Morita ◽  
Craig Jackson

Bovine Factor X is eluted in two forms (X1and X2) from anion exchange chromatographic columns. These two forms have indistinguishable amino acid compositions, molecular weights and specific activities. The amino acid sequences containing the γ-carboxyglutamic acid residues have been shown to be identical in X1 and X2(H. Morris, personal communication). An activation peptide is released from the N-terminal region of the heavy chain of Factor X by an activator from Russell’s viper venom. This peptide can be isolated after activation by gel filtration on Sephadex G-100 under nondenaturing conditions. The activation peptides from a mixture of Factors X1 and X2 were separated into two forms by anion-exchange chromatography. The activation peptide (AP1) which eluted first was shown to be derived from Factor X1. while the activation peptiae (AP2) which eluted second was shown to be derived from X2 on the basis of chromatographic separations carried out on Factors X1 and X2 separately. Factor Xa was eluted as a symmetrical single peak. On the basis of these and other data characterizing these products, we conclude that the difference between X1 and X2 are properties of the structures of the activation peptides. (Supported by a grant HL 12820 from the National Heart, Lung and Blood Institute. C.M.J. is an Established Investigator of the American Heart Association).


Sign in / Sign up

Export Citation Format

Share Document