scholarly journals Construct a variable-length fragment library for de novo protein structure prediction

2022 ◽  
Author(s):  
Qiongqiong Feng ◽  
Minghua Hou ◽  
Jun Liu ◽  
Kailong Zhao ◽  
Guijun Zhang

Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The Hidden Markov Model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile-profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins showed that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared to the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrated that the average TM-score of VFlib was 16.00% higher than that of NNMake.

PLoS ONE ◽  
2015 ◽  
Vol 10 (4) ◽  
pp. e0123998 ◽  
Author(s):  
Saulo H. P. de Oliveira ◽  
Jiye Shi ◽  
Charlotte M. Deane

Proteins are essential and are present in all life forms and determining its structure is cumbersome, laborious and time consuming. Hence, over 3-4 decades, researchers have been using computational techniques such as template and template free based protein structure prediction from its sequence. This research focuses on developing a conceptual basis for establishing an invariant fragment library which can be used for protein structure prediction. Based on 20 amino acids, fragments can be classified into lengths of 3 to 41 size. Further, they can be classified based on the identical number of amino acids present in the fragment. This encompasses theoretically the number of fragments that can exist and in no way represent the actual possible fragments that can exist in nature. Invariant fragments are ones which are rigid in structure 3-dimensionally and do not change. A formula was arrived at to determine all possible permutations that can exist for length 3 to 41 based on the 20 amino acids. 100 proteins from the Protein Data Bank were downloaded, broken into fragments of 3 to 41 resulting in a total of 6102,102 fragments using Asynchronous Distributed Processing. Then identical fragments in sequence were superimposed and Root Mean Square Deviation (RMSD) values were obtained resulting in roughly 3.2% of the original framgnets.. t-score and z-scores were obtained from which Skewness, Kurtosis and Excess Kurtosis were determined. For invariance, skewness cutoff was set at + 0.1 and using the excess kurtosis, fragments whose distribution were either leptokurtic or platykurtic and were within + 1 standard deviation of the mean value were considered as invariant i.e., if there were no outliers in the distribution and if most of the t-score or z-score values were centered around its average value. Using these cutoff values, fragments were classified and deposited into an invariant fragment library. Roughly 3,81,799 invariant fragments were obtained which is roughly 6.3% of the total number of initial fragments. This would be way less than the number of fragments that one has to either use in homology or de-novo modelling thereby reducing the design space. Further work is underway to set up the entire invariant fragment library which can then be used to predict protein structure by template-based approach.


2009 ◽  
Vol 393 (1) ◽  
pp. 249-260 ◽  
Author(s):  
David E. Kim ◽  
Ben Blum ◽  
Philip Bradley ◽  
David Baker

2019 ◽  
Author(s):  
Rebecca F. Alford ◽  
Patrick J. Fleming ◽  
Karen G. Fleming ◽  
Jeffrey J. Gray

ABSTRACTProtein design is a powerful tool for elucidating mechanisms of function and engineering new therapeutics and nanotechnologies. While soluble protein design has advanced, membrane protein design remains challenging due to difficulties in modeling the lipid bilayer. In this work, we developed an implicit approach that captures the anisotropic structure, shape of water-filled pores, and nanoscale dimensions of membranes with different lipid compositions. The model improves performance in computational bench-marks against experimental targets including prediction of protein orientations in the bilayer, ΔΔG calculations, native structure dis-crimination, and native sequence recovery. When applied to de novo protein design, this approach designs sequences with an amino acid distribution near the native amino acid distribution in membrane proteins, overcoming a critical flaw in previous membrane models that were prone to generating leucine-rich designs. Further, the proteins designed in the new membrane model exhibit native-like features including interfacial aromatic side chains, hydrophobic lengths compatible with bilayer thickness, and polar pores. Our method advances high-resolution membrane protein structure prediction and design toward tackling key biological questions and engineering challenges.Significance StatementMembrane proteins participate in many life processes including transport, signaling, and catalysis. They constitute over 30% of all proteins and are targets for over 60% of pharmaceuticals. Computational design tools for membrane proteins will transform the interrogation of basic science questions such as membrane protein thermodynamics and the pipeline for engineering new therapeutics and nanotechnologies. Existing tools are either too expensive to compute or rely on manual design strategies. In this work, we developed a fast and accurate method for membrane protein design. The tool is available to the public and will accelerate the experimental design pipeline for membrane proteins.


ChemPhysChem ◽  
2014 ◽  
Vol 15 (15) ◽  
pp. 3378-3390 ◽  
Author(s):  
Falk Hoffmann ◽  
Ioan Vancea ◽  
Sanjay G. Kamat ◽  
Birgit Strodel

2016 ◽  
Vol 11 (3) ◽  
pp. 149-155
Author(s):  
Sandhya P.N. Dubey ◽  
N. Gopalakrishna Kini ◽  
M. Sathish Kumar ◽  
S. Balaji ◽  
M.P. Sumana Bha ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document