query protein
Recently Published Documents

We present ProDCoNN-server, a web server for protein sequence design and prediction from a given protein structure. The server is based on a previously developed deep learning model for protein design, ProDCoNN, which achieved state-of-the-art performance when tested on large numbers of test proteins and benchmark datasets. The prediction is very fast compared with other protein sequence prediction servers - it takes only a few minutes for a query protein on average. Two models could be selected for different purposes: BBO for full sequence prediction, extendable for multiple sequence generation, and BBS for single position prediction with the type of other residues known. ProDCoNN-server outputs the predicted sequence and the probability matrix for each amino acid at each predicted residue. The probability matrix can also be visualized as a sequence logos figure (BBO) or probability distribution plot (BBS). The server is available at: https://prodconn.stat.fsu.edu/.

Download Full-text

Deep template-based protein structure prediction

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008954 ◽

2021 ◽

Vol 17 (5) ◽

pp. e1008954

Author(s):

Fandi Wu ◽

Jinbo Xu

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Conditional Random Fields ◽

3D Models ◽

Query Protein ◽

Distance Information ◽

Alternating Direction ◽

Template Free

Motivation Protein structure prediction has been greatly improved by deep learning, but most efforts are devoted to template-free modeling. But very few deep learning methods are developed for TBM (template-based modeling), a popular technique for protein structure prediction. TBM has been studied extensively in the past, but its accuracy is not satisfactory when highly similar templates are not available. Results This paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. NDThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally, NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence coevolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results show that NDThreader greatly outperforms existing methods such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best average GDT score among all CASP14 servers on the 58 TBM targets.

Download Full-text

Combining evolutionary and assay-labelled data for protein fitness prediction

10.1101/2021.03.28.437402 ◽

2021 ◽

Author(s):

Chloe Hsu ◽

Hunter Nisonoff ◽

Clara Fannjiang ◽

Jennifer Listgarten

Keyword(s):

Machine Learning ◽

Predictive Modelling ◽

Query Protein ◽

The Other ◽

Computational Burden ◽

Systematic Assessment ◽

Protein Properties ◽

Protein Variants ◽

Modelling Process ◽

Baseline Approach

Predictive modelling of protein properties has become increasingly important to the field of machine-learning guided protein engineering. In one of the two existing approaches, evolutionarily-related sequences to a query protein drive the modelling process, without any property measurements from the laboratory. In the other, a set of protein variants of interest are assayed, and then a supervised regression model is estimated with the assay-labelled data. Although a handful of recent methods have shown promise in combining the evolutionary and supervised approaches, this hybrid problem has not been examined in depth, leaving it unclear how practitioners should proceed, and how method developers should build on existing work. Herein, we present a systematic assessment of methods for protein fitness prediction when evolutionary and assay-labelled data are available. We find that a simple baseline approach we introduce is competitive with and often outperforms more sophisticated methods. Moreover, our simple baseline is plug-and-play with a wide variety of established methods, and does not add any substantial computational burden. Our analysis highlights the importance of systematic evaluations and sufficient baselines.

Download Full-text

Protocol for detection of bacterial proteins involved in efflux mediated antibiotic resistance (ARE) and their sub-families

10.21203/rs.3.pex-1371/v1 ◽

2021 ◽

Author(s):

Deeksha Pandey ◽

Bandana Kumari ◽

Neelja Singhal ◽

Manish Kumar

Keyword(s):

Antibiotic Resistance ◽

Prediction Method ◽

Query Protein ◽

Prediction Algorithm ◽

Support Vector ◽

Resistance Proteins ◽

Bacterial Proteins ◽

Proteomic Data ◽

Tier System ◽

Tier I

Abstract This protocol describes a method for detection of bacterial proteins involved in efflux mediated antibiotic resistance (ARE) and their sub-families as described in the research paper entitled "BacEffluxPred: A two-tier system to predict and categorize bacterial efflux mediated antibiotic resistance proteins” published in Scientific Reports. BacEffluxPred is a support vector machine based two-tier prediction method, that can be used for the detection of efflux proteins responsible for antibiotic resistance in bacteria and to identify the families to which it belongs. The overall prediction cycle includes three important steps: 1) The query protein is presented to the prediction algorithm. 2) If the query protein would be predicted to be a non-ARE protein, the prediction would stop at tier-I.3) If the query protein would be predicted as an ARE protein at the tier-I, the query protein would be forwarded to tier-II for ARE family prediction. By using these steps it is possible to generate the models that can be used on proteomic data to predict whether the given data have potential ARE proteins or not if yes it will further classified into their following families. This is the first in-silico tool for predicting bacterial ARE proteins and their families and it is freely available as both web-server and standalone versions at http://proteininformatics.org/mkumar/baceffluxpred/

Download Full-text

ProALIGN: Directly learning alignments for protein structure prediction via exploiting context-specific alignment motifs

10.1101/2020.12.28.424539 ◽

2020 ◽

Author(s):

Lupeng Kong ◽

Fusong Ju ◽

Wei-Mou Zheng ◽

Shiwei Sun ◽

Jinbo Xu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Query Sequence ◽

Query Protein ◽

Protein Alignment ◽

Protein Threading ◽

Context Specific

AbstractTemplate-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly-related templates are available.Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently-occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build structure model according to the alignment.Tested on three independent datasets with in total 6,688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods including HHpred, CNFpred, CEthreader and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.

Download Full-text

Deep Template-based Protein Structure Prediction

10.1101/2020.12.26.424433 ◽

2020 ◽

Author(s):

Fandi Wu ◽

Jinbo Xu

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Random Fields ◽

Structure Prediction ◽

Conditional Random Fields ◽

3D Models ◽

Query Protein ◽

Supplementary Information ◽

Distance Information ◽

Alternating Direction

AbstractMotivationTBM (template-based modeling) is a popular method for protein structure prediction. When very good templates are not available, it is challenging to identify the best templates, build accurate sequence-template alignments and construct 3D models from alignments.ResultsThis paper presents a new method NDThreader (New Deep-learning Threader) to address the challenges of TBM. DNThreader first employs DRNF (deep convolutional residual neural fields), which is an integration of deep ResNet (convolutional residue neural networks) and CRF (conditional random fields), to align a query protein to templates without using any distance information. Then NDThreader uses ADMM (alternating direction method of multipliers) and DRNF to further improve sequence-template alignments by making use of predicted distance potential. Finally NDThreader builds 3D models from a sequence-template alignment by feeding it and sequence co-evolution information into a deep ResNet to predict inter-atom distance distribution, which is then fed into PyRosetta for 3D model construction. Our experimental results on the CASP13 and CAMEO data show that our methods outperform existing ones such as CNFpred, HHpred, DeepThreader and CEthreader. NDThreader was blindly tested in CASP14 as a part of RaptorX server, which obtained the best GDT score among all CASP14 servers on the 58 TBM targets.Availability and Implementationavailable as a part of web server at http://[email protected] InformationSupplementary data are available online.

Download Full-text

CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction

10.1101/2020.10.06.327585 ◽

2020 ◽

Author(s):

Fusong Ju ◽

Jianwei Zhu ◽

Bin Shao ◽

Lupeng Kong ◽

Tie-Yan Liu ◽

...

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Protein ◽

Spatial Proximity ◽

Multiple Sequence ◽

Variance Matrix

Protein functions are largely determined by the final details of their tertiary structures, and the structures could be accurately reconstructed based on inter-residue distances. Residue co-evolution has become the primary principle for estimating inter-residue distances since the residues in close spatial proximity tend to co-evolve. The widely-used approaches infer residue co-evolution using an indirect strategy, i.e., they first extract from the multiple sequence alignment (MSA) of query protein some handcrafted features, say, co-variance matrix, and then infer residue co-evolution using these features rather than the raw information carried by MSA. This indirect strategy always leads to considerable information loss and inaccurate estimation of inter-residue distances. Here, we report a deep neural network framework (called CopulaNet) to learn residue co-evolution directly from MSA without any handcrafted features. The CopulaNet consists of two key elements: i) an encoder to model context-specific mutation for each residue, and ii) an aggregator to model correlations among residues and thereafter infer residue co-evolutions. Using the CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrated the successful application of CopulaNet for estimating inter-residue distances and further predicting protein tertiary structure with improved accuracy and efficiency. Head-to-head comparison suggested that for 24 out of the 31 free modeling CASP13 domains, ProFOLD outperformed AlphaFold, one of the state-of-the-art prediction approaches.

Download Full-text

SimExact – An Efficient Method to Compute Function Similarity Between Proteins Using Gene Ontology

Current Bioinformatics ◽

10.2174/1574893614666191017092842 ◽

2020 ◽

Vol 15 (4) ◽

pp. 318-327

Author(s):

Najmul Ikram ◽

Muhammad Abdul Qadir ◽

Muhammad Tanvir Afzal

Keyword(s):

Gene Ontology ◽

High Speed ◽

Sequence Similarity ◽

Query Protein ◽

Online Tool ◽

Compute Function ◽

Novel Method ◽

Function Similarity ◽

Functional Prototype ◽

Ranked List

Background: The rapidly growing protein and annotation databases necessitate the development of efficient tools to process this valuable information. Biologists frequently need to find proteins similar to a given protein, for which BLAST tools are commonly used. With the development of biomedical ontologies, e.g. Gene Ontology, methods were designed to measure function (semantic) similarity between two proteins. These methods work well on protein pairs, but are not suitable for protein query processing. Objective: Our aim is to facilitate searching of similar proteins in an acceptable time. Methods: A novel method SimExact for high speed searching of functionally similar proteins has been proposed. Results: The experiments of this study show that SimExact gives correct results required for protein searching. A fully functional prototype of an online tool (www.datafurnish.com/protsem.php) has been provided that generates a ranked list of the proteins similar to a query protein, with a response time of less than 20 seconds in our setup. SimExact was used to search for protein pairs having high disparity between function similarity and sequence similarity. Conclusion: SimExact makes such searches practical, which would not be possible in a reasonable time otherwise.

Download Full-text

Zebra2: advanced and easy-to-use web-server for bioinformatic analysis of subfamily-specific and conserved positions in diverse protein superfamilies

Nucleic Acids Research ◽

10.1093/nar/gkaa276 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W65-W71 ◽

Cited By ~ 1

Author(s):

Dmitry Suplatov ◽

Yana Sharapova ◽

Elizaveta Geraseva ◽

Vytas Švedas

Keyword(s):

Rational Design ◽

Sequence Similarity ◽

3D Structure ◽

Web Server ◽

Bioinformatic Analysis ◽

Query Protein ◽

Systematic Analysis ◽

External Resources ◽

Protein Superfamilies ◽

Network Interfaces

Abstract Zebra2 is a highly automated web-tool to search for subfamily-specific and conserved positions (i.e. the determinants of functional diversity as well as the key catalytic and structural residues) in protein superfamilies. The bioinformatic analysis is facilitated by Mustguseal—a companion web-server to automatically collect and superimpose a large representative set of functionally diverse homologs with high structure similarity but low sequence identity to the selected query protein. The results are automatically prioritized and provided at four information levels to facilitate the knowledge-driven expert selection of the most promising positions on-line: as a sequence similarity network; interfaces to sequence-based and 3D-structure-based analysis of conservation and variability; and accompanied by the detailed annotation of proteins accumulated from the integrated databases with links to the external resources. The integration of Zebra2 and Mustguseal web-tools provides the first of its kind out-of-the-box open-access solution to conduct a systematic analysis of evolutionarily related proteins implementing different functions within a shared 3D-structure of the superfamily, determine common and specific patterns of function-associated local structural elements, assist to select hot-spots for rational design and to prepare focused libraries for directed evolution. The web-servers are free and open to all users at https://biokinet.belozersky.msu.ru/zebra2, no login required.

Download Full-text

DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins

10.1101/2020.01.31.923409 ◽

2020 ◽

Author(s):

Sutanu Bhattacharya ◽

Rahmatullah Roche ◽

Debswapna Bhattacharya

Keyword(s):

Large Scale ◽

Query Protein ◽

Spatial Proximity ◽

Neighborhood Effect ◽

Homologous Proteins ◽

Distance Information ◽

Topological Network ◽

Network Similarity ◽

Improved Performance ◽

Standard Profile

Motivation: Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence-based predicted contact or distance information is used. Contact- or distance-assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query-template alignment. Results: We present a new distance- and orientation-based covariational threading method called DisCovER by effectively integrating information from inter-residue distance and orientation along with the topological network neighborhood of a query-template alignment. Our method first selects a subset of templates using standard profile-based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance- and orientation-based query-template alignment using an iterative double dynamic programming framework. Multiple large-scale benchmarking results on query proteins classified as hard targets from the Continuous Automated Model Evaluation (CAMEO) experiment and from the current literature show that our method outperforms several existing state-of-the-art threading approaches; and that the integration of the neighborhood effect with the inter-residue distance and orientation information synergistically contributes to the improved performance of DisCovER. Availability: https://github.com/Bhattacharya-Lab/DisCovER

Download Full-text

query proteinRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

ProDCoNN-server: a web server for protein sequence prediction and design from a three-dimensional structure

Deep template-based protein structure prediction

Combining evolutionary and assay-labelled data for protein fitness prediction

Protocol for detection of bacterial proteins involved in efflux mediated antibiotic resistance (ARE) and their sub-families

ProALIGN: Directly learning alignments for protein structure prediction via exploiting context-specific alignment motifs

Deep Template-based Protein Structure Prediction

CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction

SimExact – An Efficient Method to Compute Function Similarity Between Proteins Using Gene Ontology

Zebra2: advanced and easy-to-use web-server for bioinformatic analysis of subfamily-specific and conserved positions in diverse protein superfamilies

DisCovER: distance- and orientation-based covariational threading for weakly homologous proteins

query protein
Recently Published Documents