scholarly journals Correction: The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function

Author(s):  
Marco Punta ◽  
Yanay Ofran
Author(s):  
Amelia Villegas-Morcillo ◽  
Stavros Makrodimitris ◽  
Roeland C H J van Ham ◽  
Angel M Gomez ◽  
Victoria Sanchez ◽  
...  

Abstract Motivation Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k-mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. Availability and implementation Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Boqiao Lai ◽  
Jinbo Xu

Experimental protein function annotation does not scale with the fast-growing sequence databases. Only a tiny fraction (<0.1%) of protein sequences in UniProtKB has experimentally determined functional annotations. Computational methods may predict protein function in a high-throughput way, but its accuracy is not very satisfactory. Based upon recent breakthroughs in protein structure prediction and protein language models, we develop GAT-GO, a graph attention network (GAT) method that may substantially improve protein function prediction by leveraging predicted inter-residue contact graphs and protein sequence embedding. Our experimental results show that GAT-GO greatly outperforms the latest sequence- and structure-based deep learning methods. On the PDB-mmseqs testset where the train and test proteins share <15% sequence identity, GAT-GO yields Fmax(maximum F-score) 0.508, 0.416, 0.501, and AUPRC(area under the precision-recall curve) 0.427, 0.253, 0.411 for the MFO, BPO, CCO ontology domains, respectively, much better than homology-based method BLAST (Fmax 0.117,0.121,0.207 and AUPRC 0.120, 0.120, 0.163). On the PDB-cdhit testset where the training and test proteins share higher sequence identity, GAT-GO obtains Fmax 0.637, 0.501, 0.542 for the MFO, BPO, CCO ontology domains, respectively, and AUPRC 0.662, 0.384, 0.481, significantly exceeding the just-published graph convolution method DeepFRI, which has Fmax 0.542, 0.425, 0.424 and AUPRC 0.313, 0.159, 0.193.


2007 ◽  
Vol 30 (4) ◽  
pp. 84
Author(s):  
Michael D. Jain ◽  
Hisao Nagaya ◽  
Annalyn Gilchrist ◽  
Miroslaw Cygler ◽  
John J.M. Bergeron

Protein synthesis, folding and degradation functions are spatially segregated in the endoplasmic reticulum (ER) with respect to the membrane and the ribosome (rough and smooth ER). Interrogation of a proteomics resource characterizing rough and smooth ER membranes subfractionated into cytosolic, membrane, and soluble fractions gives a spatial map of known proteins involved in ER function. The spatial localization of 224 identified unknown proteins in the ER is predicted to give insight into their function. Here we provide evidence that the proteomics resource accurately predicts the function of new proteins involved in protein synthesis (nudilin), protein translocation across the ER membrane (nicalin), co-translational protein folding (stexin), and distal protein folding in the lumen of the ER (erlin-1, TMX2). Proteomics provides the spatial localization of proteins and can be used to accurately predict protein function.


Sign in / Sign up

Export Citation Format

Share Document