query protein
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 12)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Yuan Zhang ◽  
Arunima Mandal ◽  
Kevin Cui ◽  
Xiuwen Liu ◽  
Jinfeng Zhang

We present ProDCoNN-server, a web server for protein sequence design and prediction from a given protein structure. The server is based on a previously developed deep learning model for protein design, ProDCoNN, which achieved state-of-the-art performance when tested on large numbers of test proteins and benchmark datasets. The prediction is very fast compared with other protein sequence prediction servers - it takes only a few minutes for a query protein on average. Two models could be selected for different purposes: BBO for full sequence prediction, extendable for multiple sequence generation, and BBS for single position prediction with the type of other residues known. ProDCoNN-server outputs the predicted sequence and the probability matrix for each amino acid at each predicted residue. The probability matrix can also be visualized as a sequence logos figure (BBO) or probability distribution plot (BBS). The server is available at: https://prodconn.stat.fsu.edu/.


2021 ◽  
Vol 17 (5) ◽  
pp. e1008954
Author(s):  
Fandi Wu ◽  
Jinbo Xu

Motivation Protein structure prediction has been greatly improved by deep learning, but most efforts are devoted to template-free modeling. But very few deep learning methods are developed for TBM (template-based modeling), a popular technique for protein structure prediction. TBM has been studied extensively in the past, but its accuracy is not satisfactory when highly similar templates are not available. Results This paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. NDThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally, NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence coevolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results show that NDThreader greatly outperforms existing methods such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best average GDT score among all CASP14 servers on the 58 TBM targets.


2021 ◽  
Author(s):  
Chloe Hsu ◽  
Hunter Nisonoff ◽  
Clara Fannjiang ◽  
Jennifer Listgarten

Predictive modelling of protein properties has become increasingly important to the field of machine-learning guided protein engineering. In one of the two existing approaches, evolutionarily-related sequences to a query protein drive the modelling process, without any property measurements from the laboratory. In the other, a set of protein variants of interest are assayed, and then a supervised regression model is estimated with the assay-labelled data. Although a handful of recent methods have shown promise in combining the evolutionary and supervised approaches, this hybrid problem has not been examined in depth, leaving it unclear how practitioners should proceed, and how method developers should build on existing work. Herein, we present a systematic assessment of methods for protein fitness prediction when evolutionary and assay-labelled data are available. We find that a simple baseline approach we introduce is competitive with and often outperforms more sophisticated methods. Moreover, our simple baseline is plug-and-play with a wide variety of established methods, and does not add any substantial computational burden. Our analysis highlights the importance of systematic evaluations and sufficient baselines.


2021 ◽  
Author(s):  
Deeksha Pandey ◽  
Bandana Kumari ◽  
Neelja Singhal ◽  
Manish Kumar

Abstract This protocol describes a method for detection of bacterial proteins involved in efflux mediated antibiotic resistance (ARE) and their sub-families as described in the research paper entitled "BacEffluxPred: A two-tier system to predict and categorize bacterial efflux mediated antibiotic resistance proteins” published in Scientific Reports. BacEffluxPred is a support vector machine based two-tier prediction method, that can be used for the detection of efflux proteins responsible for antibiotic resistance in bacteria and to identify the families to which it belongs. The overall prediction cycle includes three important steps: 1) The query protein is presented to the prediction algorithm. 2) If the query protein would be predicted to be a non-ARE protein, the prediction would stop at tier-I.3) If the query protein would be predicted as an ARE protein at the tier-I, the query protein would be forwarded to tier-II for ARE family prediction. By using these steps it is possible to generate the models that can be used on proteomic data to predict whether the given data have potential ARE proteins or not if yes it will further classified into their following families. This is the first in-silico tool for predicting bacterial ARE proteins and their families and it is freely available as both web-server and standalone versions at http://proteininformatics.org/mkumar/baceffluxpred/


2020 ◽  
Author(s):  
Lupeng Kong ◽  
Fusong Ju ◽  
Wei-Mou Zheng ◽  
Shiwei Sun ◽  
Jinbo Xu ◽  
...  

AbstractTemplate-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly-related templates are available.Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently-occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build structure model according to the alignment.Tested on three independent datasets with in total 6,688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods including HHpred, CNFpred, CEthreader and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.


2020 ◽  
Author(s):  
Fandi Wu ◽  
Jinbo Xu

AbstractMotivationTBM (template-based modeling) is a popular method for protein structure prediction. When very good templates are not available, it is challenging to identify the best templates, build accurate sequence-template alignments and construct 3D models from alignments.ResultsThis paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. DNThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence co-evolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results on the CASP13 and CAMEO data show that our methods outperform existing ones such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best GDT score among all CASP14 servers on the 58 TBM targets.Availability and Implementationavailable as a part of web server at http://[email protected] InformationSupplementary data are available online.


2020 ◽  
Author(s):  
Fusong Ju ◽  
Jianwei Zhu ◽  
Bin Shao ◽  
Lupeng Kong ◽  
Tie-Yan Liu ◽  
...  

Protein functions are largely determined by the final details of their tertiary structures, and the structures could be accurately reconstructed based on inter-residue distances. Residue co-evolution has become the primary principle for estimating inter-residue distances since the residues in close spatial proximity tend to co-evolve. The widely-used approaches infer residue co-evolution using an indirect strategy, i.e., they first extract from the multiple sequence alignment (MSA) of query protein some handcrafted features, say, co-variance matrix, and then infer residue co-evolution using these features rather than the raw information carried by MSA. This indirect strategy always leads to considerable information loss and inaccurate estimation of inter-residue distances. Here, we report a deep neural network framework (called CopulaNet) to learn residue co-evolution directly from MSA without any handcrafted features. The CopulaNet consists of two key elements: i) an encoder to model context-specific mutation for each residue, and ii) an aggregator to model correlations among residues and thereafter infer residue co-evolutions. Using the CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrated the successful application of CopulaNet for estimating inter-residue distances and further predicting protein tertiary structure with improved accuracy and efficiency. Head-to-head comparison suggested that for 24 out of the 31 free modeling CASP13 domains, ProFOLD outperformed AlphaFold, one of the state-of-the-art prediction approaches.


2020 ◽  
Vol 15 (4) ◽  
pp. 318-327
Author(s):  
Najmul Ikram ◽  
Muhammad Abdul Qadir ◽  
Muhammad Tanvir Afzal

Background: The rapidly growing protein and annotation databases necessitate the development of efficient tools to process this valuable information. Biologists frequently need to find proteins similar to a given protein, for which BLAST tools are commonly used. With the development of biomedical ontologies, e.g. Gene Ontology, methods were designed to measure function (semantic) similarity between two proteins. These methods work well on protein pairs, but are not suitable for protein query processing. Objective: Our aim is to facilitate searching of similar proteins in an acceptable time. Methods: A novel method SimExact for high speed searching of functionally similar proteins has been proposed. Results: The experiments of this study show that SimExact gives correct results required for protein searching. A fully functional prototype of an online tool (www.datafurnish.com/protsem.php) has been provided that generates a ranked list of the proteins similar to a query protein, with a response time of less than 20 seconds in our setup. SimExact was used to search for protein pairs having high disparity between function similarity and sequence similarity. Conclusion: SimExact makes such searches practical, which would not be possible in a reasonable time otherwise.


2020 ◽  
Vol 48 (W1) ◽  
pp. W65-W71 ◽  
Author(s):  
Dmitry Suplatov ◽  
Yana Sharapova ◽  
Elizaveta Geraseva ◽  
Vytas Švedas

Abstract Zebra2 is a highly automated web-tool to search for subfamily-specific and conserved positions (i.e. the determinants of functional diversity as well as the key catalytic and structural residues) in protein superfamilies. The bioinformatic analysis is facilitated by Mustguseal—a companion web-server to automatically collect and superimpose a large representative set of functionally diverse homologs with high structure similarity but low sequence identity to the selected query protein. The results are automatically prioritized and provided at four information levels to facilitate the knowledge-driven expert selection of the most promising positions on-line: as a sequence similarity network; interfaces to sequence-based and 3D-structure-based analysis of conservation and variability; and accompanied by the detailed annotation of proteins accumulated from the integrated databases with links to the external resources. The integration of Zebra2 and Mustguseal web-tools provides the first of its kind out-of-the-box open-access solution to conduct a systematic analysis of evolutionarily related proteins implementing different functions within a shared 3D-structure of the superfamily, determine common and specific patterns of function-associated local structural elements, assist to select hot-spots for rational design and to prepare focused libraries for directed evolution. The web-servers are free and open to all users at https://biokinet.belozersky.msu.ru/zebra2, no login required.


2020 ◽  
Author(s):  
Sutanu Bhattacharya ◽  
Rahmatullah Roche ◽  
Debswapna Bhattacharya

Motivation: Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact- or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. Results: We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as hard targets from the Continuous Automated Model Evaluation (CAMEO) experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches; and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. Availability: https://github.com/Bhattacharya-Lab/DisCovER


Sign in / Sign up

Export Citation Format

Share Document