fold prediction
Recently Published Documents


TOTAL DOCUMENTS

56
(FIVE YEARS 10)

H-INDEX

17
(FIVE YEARS 2)

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Rahil Taujale ◽  
Zhongliang Zhou ◽  
Wayland Yeung ◽  
Kelley W. Moremen ◽  
Sheng Li ◽  
...  

AbstractGlycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.


Antibodies ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 35
Author(s):  
Iftekhar Mahmood

These days, there is a lot of emphasis on the prediction of human clearance (CL) from a single species for monoclonal antibodies (mabs). Many studies indicate that monkey is the most suitable species for the prediction of human clearance for mabs. However, it is not well established if rodents (mouse or rat) can also be used to predict human CL for mabs. The objectives of this study were to predict and compare human CL as well as first-in-human dose of mabs from mouse or rat, ormonkey. Four methods were used for the prediction of human CL of mabs. These methods were: use of four allometric exponents (0.75, 0.80, 0.85, and 0.90), a minimal physiologically based pharmacokinetics method (mPBPK), lymph flow rate, and liver blood flow rate. Based on the predicted CL, first-in-human dose of mabs was projected using either exponent 1.0 (linear scaling) or exponent 0.85, and human-equivalent dose (HED) from each of these species. The results of the study indicated that rat or mouse could provide a reasonably accurate prediction of human CL as well as first-in-human dose of mabs. When exponent 0.85 was used for CL prediction, there were 78%, 95%, and 92% observations within a 2-fold prediction error for mouse, rat, and monkey, respectively. Predicted human dose fell within the observed human dose range (administered to humans) for 10 out of 13 mabs for mouse, 11 out of 12 mabs for rat, and 12 out of 15 mabs for monkey. Overall, the clearance and first-in-human dose of mabs were predicted reasonably well by all three species (a single species). On average, monkey may be the best species for the prediction of human clearance and human dose but mouse or rat especially; rat can be a very useful species for conducting the aforementioned studies.


2021 ◽  
Author(s):  
Rahil Taujale ◽  
Zhongliang Zhou ◽  
Wayland Yeung ◽  
Kelley W Moremen ◽  
Sheng Li ◽  
...  

Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through 10 the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small 11 molecule substrates. The extensive structural and functional diversification of GTs presents a 12 major challenge in mapping the relationships connecting sequence, structure, fold and function 13 using traditional bioinformatics approaches. Here, we present a convolutional neural network 14 with attention (CNN-attention) based deep learning model that leverages simple secondary 15 structure representations generated from primary sequences to provide GT fold prediction with 16 high accuracy. The model learned distinguishing features free of primary sequence alignment 17 constraints and, unlike other models, is highly interpretable and helped identify common 18 secondary structural features shared by divergent families. The model delineated sequence and 19 structural features characteristic of individual fold types, while classifying them into distinct 20 clusters that group evolutionarily divergent families based on shared secondary structural 21 features. We further extend our model to classify GT families of unknown folds and variants of 22 known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and 23 GT97, our studies identify targets for future structural studies and expand the GT fold landscape.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jason Shumake ◽  
Travis T. Mallard ◽  
John E. McGeary ◽  
Christopher G. Beevers

AbstractIdentifying in advance who is unlikely to respond to a specific antidepressant treatment is crucial to precision medicine efforts. The current work leverages genome-wide genetic variation and machine learning to predict response to the antidepressant citalopram using data from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial (n = 1257 with both valid genomic and outcome data). A confirmatory approach selected 11 SNPs previously reported to predict response to escitalopram in a sample different from the current study. A novel exploratory approach selected SNPs from across the genome using nested cross-validation with elastic net logistic regression with a predominantly lasso penalty (alpha = 0.99). SNPs from each approach were combined with baseline clinical predictors and treatment response outcomes were predicted using a stacked ensemble of gradient boosting decision trees. Using pre-treatment clinical and symptom predictors only, out-of-fold prediction of a novel treatment response definition based on STAR*D treatment guidelines was acceptable, AUC = .659, 95% CI [0.629, 0.689]. The inclusion of SNPs using confirmatory or exploratory selection methods did not improve the out-of-fold prediction of treatment response (AUCs were .662, 95% CI [0.632, 0.692] and .655, 95% CI [0.625, 0.685], respectively). A similar pattern of results were observed for the secondary outcomes of the presence or absence of distressing side effects regardless of treatment response and achieving remission or satisfactory partial response, assuming medication tolerance. In the current study, incorporating SNP variation into prognostic models did not enhance the prediction of citalopram response in the STAR*D sample.


Antibodies ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 1
Author(s):  
Iftekhar Mahmood

Allometric scaling is a useful tool for the extrapolation of pharmacokinetic parameters from animals to humans. The objective of this study was to predict human clearance of antibody–drug conjugates (ADC) allometrically from one to three animal species and compare the predicted human clearance with the observed human clearance. For three animal species allometric scaling, the “Rule of Exponents” (ROE) was used. The results of the study indicated that three-species allometric scaling in association with the ROE provides acceptable prediction (within 0.5–2-fold prediction error) of human clearance. The two-species allometric scaling resulted in substantial prediction error. One-species scaling using a fixed exponent of 1.0 provided acceptable prediction error (within 0.5–2-fold) by monkey, rat, and mouse, in which monkey and rat were comparable. Overall, the predicted human clearance values of ADCs from animal(s) was good. The allometric method proposed in this article can be used to predict human clearance from the animal data and subsequently to select the first-in-human dose of ADCs.


2020 ◽  
Vol 10 (5) ◽  
pp. 6306-6316

Protein fold prediction is a milestone step towards predicting protein tertiary structure from protein sequence. It is considered one of the most researched topics in the area of Computational Biology. It has applications in the area of structural biology and medicines. Extracting sensitive features for prediction is a key step in protein fold prediction. The actionable features are extracted from keywords of sequence header and secondary structure representations of protein sequence. The keywords holding species information are used as features after verifying with uniref100 dataset using TaxId. Prominent patterns are identified experimentally based on the nature of protein structural class and protein fold. Global and native features are extracted capturing the nature of patterns experimentally. It is found that keywords based features have positive correlation with protein folds. Keywords indicating species are important for observing functional differences which help in guiding the prediction process. SCOPe 2.07 and EDD datasets are used. EDD is a benchmark dataset and SCOPe 2.07 is the latest and largest dataset holding astral protein sequences. The training set of SCOPe 2.07 is trained using 93 dimensional features vector using Random forest algorithm. The prediction results of SCOPe 2.07 test set reports the accuracy of better than 95%. The accuracy achieved on benchmark dataset EDD is better than 93%, which is best reported as per our knowledge.


Author(s):  
John J. Ferrie ◽  
E. James Petersson

AbstractAs recognition of the abundance and relevance of intrinsically disordered proteins (IDPs) continues to grow, demand increases for methods that can rapidly predict the conformational ensembles populated by these proteins. To date, IDP simulations have largely been dominated by molecular dynamics (MD) simulations, which require significant compute times and/or complex hardware. Recent developments in MD have afforded methods capable of simulating both ordered and disordered proteins, yet to date accurate fold prediction from sequence has been dominated by Monte-Carlo (MC) based methods such as Rosetta. To overcome the limitations of current approaches in IDP simulation using Rosetta while maintaining its utility for modeling folded domains, we developed PyRosetta-based algorithms that allow for the accurate de novo prediction of proteins across all degrees of foldedness along with structural ensembles of disordered proteins. Our simulations have an accuracy comparable to state-of-the-art MD with vastly reduced computational demands.


2020 ◽  
Vol 118 (2) ◽  
pp. 366-375 ◽  
Author(s):  
Diego del Alamo ◽  
Maxx H. Tessmer ◽  
Richard A. Stein ◽  
Jimmy B. Feix ◽  
Hassane S. Mchaourab ◽  
...  

2019 ◽  
Vol 14 (8) ◽  
pp. 688-697 ◽  
Author(s):  
Komal Patil ◽  
Usha Chouhan

Background: Protein fold prediction is a fundamental step in Structural Bioinformatics. The tertiary structure of a protein determines its function and to predict its tertiary structure, fold prediction serves an important role. Protein fold is simply the arrangement of the secondary structure elements relative to each other in space. A number of studies have been carried out till date by different research groups working worldwide in this field by using the combination of different benchmark datasets, different types of descriptors, features and classification techniques. Objective: In this study, we have tried to put all these contributions together, analyze their study and to compare different techniques used by them. Methods: Different features are derived from protein sequence, its secondary structure, different physicochemical properties of amino acids, domain composition, Position Specific Scoring Matrix, profile and threading techniques. Conclusion: Combination of these different features can improve classification accuracy to a large extent. With the help of this survey, one can know the most suitable feature/attribute set and classification technique for this multi-class protein fold classification problem.


Sign in / Sign up

Export Citation Format

Share Document