scholarly journals DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks

2021 ◽  
Vol 22 (24) ◽  
pp. 13555
Author(s):  
Mohammad Madani ◽  
Kaixiang Lin ◽  
Anna Tarakanova

Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein’s function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to predict protein solubility are costly, time-consuming, and frequently offer only low success rates. To reduce cost and expedite the development of therapeutic and industrially relevant proteins, a highly accurate computational tool for predicting protein solubility from protein sequence is sought. While a number of in silico prediction tools exist, they suffer from relatively low prediction accuracy, bias toward the soluble proteins, and limited applicability for various classes of proteins. In this study, we developed a novel deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks and outperforms all existing protein solubility prediction models. This model captures the frequently occurring amino acid k-mers and their local and global interactions and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve improved accuracy, using only protein sequence as input. DSResSol outperforms all available sequence-based solubility predictors by at least 5% in terms of accuracy when evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13% higher than existing models. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for the fast, reliable, and inexpensive prediction of a protein’s solubility to guide experimental design.

2021 ◽  
Author(s):  
Mohammad Madani ◽  
Kaixiang Lin ◽  
Anna Tarakanova

Protein solubility is an important thermodynamic parameter critical for the characterization of a protein's function, and a key determinant for the production yield of a protein in both the research setting and within industrial applications. Thus, a highly accurate in silico bioinformatics tool for predicting protein solubility from protein sequence is sought. In this study, we developed a deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks. The model captures the frequently occurring amino acid k-mers and their local and global interactions, and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve higher performance in comparison to existing deep learning-based models. DSResSol uses protein sequence as input, outperforming all available sequence-based solubility predictors by at least 5 percent in accuracy when the performance is evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13 percent higher. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for fast, reliable, and inexpensive prediction of a protein's solubility to guide experimental design.


Author(s):  
D. Filimonov ◽  
A. Lagunin

It is advisable to use data peptide's chemical structures with amino acids (AMA) substitution and the corresponding sections of the protein sequence without mutation to construct classification models predicting the pathogenic effects AMA substitutions based on MNA descriptors.


2018 ◽  
Author(s):  
Jeffrey I. Boucher ◽  
Troy W. Whitfield ◽  
Ann Dauphin ◽  
Gily Nachum ◽  
Carl Hollins ◽  
...  

AbstractThe evolution of HIV-1 protein sequences should be governed by a combination of factors including nucleotide mutational probabilities, the genetic code, and fitness. The impact of these factors on protein sequence evolution are interdependent, making it challenging to infer the individual contribution of each factor from phylogenetic analyses alone. We investigated the protein sequence evolution of HIV-1 by determining an experimental fitness landscape of all individual amino acid changes in protease. We compared our experimental results to the frequency of protease variants in a publicly available dataset of 32,163 sequenced isolates from drug-naïve individuals. The most common amino acids in sequenced isolates supported robust experimental fitness, indicating that the experimental fitness landscape captured key features of selection acting on protease during viral infections of hosts. Amino acid changes requiring multiple mutations from the likely ancestor were slightly less likely to support robust experimental fitness than single mutations, consistent with the genetic code favoring chemically conservative amino acid changes. Amino acids that were common in sequenced isolates were predominantly accessible by single mutations from the likely protease ancestor. Multiple mutations commonly observed in isolates were accessible by mutational walks with highly fit single mutation intermediates. Our results indicate that the prevalence of multiple base mutations in HIV-1 protease is strongly influenced by mutational sampling.


2021 ◽  
Vol 8 (6) ◽  
pp. 201852
Author(s):  
Yi Qian ◽  
Rui Zhang ◽  
Xinglu Jiang ◽  
Guoqiu Wu

Four nucleotides (A, U, C and G) constitute 64 codons at free combination but 64 codons are unequally assigned to 21 items (20 amino acids plus one stop). About 500 amino acids are known but only 20 are selected to make up the proteins. However, the relationships between amino acid and codon and between 20 amino acids have been unclear. In this paper, we studied the relationships between 20 amino acids in 33 species and found there were three constraints between 20 amino acids, such as the relatively stable mean carbon and hydrogen (C : H) ratios (0.50), similarity interactions between the constituent ratios of amino acids, and the frequency of amino acids according with Poisson distribution under certain conditions. We demonstrated that the unequal distribution of 64 codons and the choice of amino acids in molecular evolution would be constrained to remain stable C : H ratios. The constituent ratios and frequency of 20 amino acids in a species or a protein are two determinants of protein sequence evolution, so this finding showed the constraints between 20 amino acids played an important role in protein sequence evolution.


2019 ◽  
Vol 15 (4) ◽  
pp. 367-375
Author(s):  
Martin A. Mune Mune ◽  
Christian B. Bassogog ◽  
Pierre A. Bayiga ◽  
Carine E. Nyobe ◽  
Samuel R. Minka

Background: There is a constant search of new plant proteins, with adequate nutritional and functional properties, as well as bioactive properties and low-cost for utilization in various food formulations. Objective: The aim of this work was to access the nutritional and functional potential of protein from Irvingia gabonensis, for utilization as ingredient or supplement in food. Methods: Proximate composition and amino acid were analyzed. Nutritional parameters were calculated from amino acid composition. Physicochemical properties and secondary structure of protein were determined. Finally, effect of oil to water ratio (OWR), pH and concentration on emulsifying properties was analyzed. Results: The flour contained 22.26% protein, 5.30% ash and 60% carbohydrates. Proteins contained all essential amino acids, with high content of Leu, Ile, Val, Thr and sulfur-containing amino acids. Essential amino acid index (69%), protein efficiency ratio (2.39-2.63) and biological value (79.91%) were studied. The maximum protein solubility (61%) was noticed at pH 8, while high hydrophobicity was observed at pH 2. A transition from an irregular secondary structure to a more ordered structure was found from pH 2-4 to pH 6-10. pH, OWR and concentration significantly affected emulsifying properties of Irvingia gabonensis almonds. The maximum emulsifying capacity (EC) was observed under acidic pH and high flour concentration. EC increased with increasing OWR and concentration, while decreased with increasing pH. High ES (25-35%) was observed at pH 4-8 and OWR of 1/3 to 1/2 (v/v), at flour concentration of 3-4% (w/v). Conclusion: Irvingia gabonensis showed good potential as food ingredient or supplement.


2018 ◽  
Vol 143 (1) ◽  
pp. 45-55 ◽  
Author(s):  
Jinyu Wang ◽  
Bo Yuan ◽  
Yi Xu ◽  
Bingru Huang

Amino acid and protein metabolism are interrelated and both play important roles in plant adaptation to heat stress. The objective of this study was to identify amino acids and soluble proteins associated with genetic variation in heat tolerance of hard fescue (Festuca trachyphylla). According to a previous screening experiment, the hard fescue cultivars Reliant IV and Predator were selected as heat-tolerant and heat-sensitive cultivars, respectively. Plants of these two hard fescue cultivars were exposed to heat stress at 38/33 °C (day/night) or optimal temperature at 21/18 °C in growth chambers. Each cultivar had four replications under each temperature, and the experimental design was a split-plot design, temperature as the main plots and cultivars as the subplots. Under heat stress, ‘Reliant IV’ exhibited higher turf quality (TQ) and greater membrane stability than ‘Predator’. In response to heat stress, total amino acid content increased, whereas total soluble protein content decreased in both cultivars. The greater accumulation of amino acids in ‘Reliant IV’ was contributed by the greater increase of proteins involved in the glycolysis and the tricarboxylic acid (TCA) cycle that provided carbon skeleton for amino acid synthesis. ‘Reliant IV’ leaves exhibited greater extent of increases in the content of six individual amino acids (histidine, glutamine, proline, threonine, aspartate, and tryptophan) than ‘Predator’ under heat stress. Several soluble proteins were upregulated in response to heat stress, to a greater extent in ‘Reliant IV’ than ‘Predator’, including the proteins involved in photosynthesis, protein folding, redox hemostasis, stress signaling, stress defense, cell organization, and metabolism. These differentially accumulated free amino acids and soluble proteins could be associated with the genetic variation in heat tolerance of hard fescue.


2018 ◽  
Author(s):  
Antara Sengupta ◽  
Pabitra Pal Choudhury

AbstractThe aim of this paper is to make quantitative analysis of the properties which is really being carried from DNA sequence and finally landing up to the properties of a protein structure through its primary protein sequence. Thus, the paper has a theory which is applicable for any arbitrary DNA sequence whether it is of various species or mutated data or a bunch of genes responsible for a function to be occurred. Irrespective to genes of any families, species, wild type or mutated, our paper here gives a standard model which defines a mapping between physicochemical properties of any arbitrary DNA sequence and physicochemical properties of its amino acid sequence. Experiments have been carried out with PPCA protein family and its four homologs PPC(B E) which establishes that DNA sequence keeps its signature even after its translation into the corresponding amino acid sequence.


2008 ◽  
Vol 2 (1) ◽  
pp. 37-49 ◽  
Author(s):  
Kevin Campbell ◽  
Lukasz Kurgan

Development of accurate β-turn (beta-turn) type prediction methods would contribute towards the prediction of the tertiary protein structure and would provide useful insights/inputs for the fold recognition and drug design. Only one existing sequence-only method is available for the prediction of beta-turn types (for type I and II) for the entire protein chains, while the proposed method allows for prediction of type I, II, IV, VII, and non-specific (NS) beta-turns, filling in the gap. The proposed predictor, which is based solely on protein sequence, is shown to provide similar performance to other sequence-only methods for prediction of beta-turns and beta-turn types. The main advantage of the proposed method is simplicity and interpretability of the underlying model. We developed novel sequence-based features that allow identifying beta-turns types and differentiating them from non-beta-turns. The features, which are based on tetrapeptides (entire beta-turns) rather than a window centered over the predicted residues as in the case of recent competing methods, provide a more biologically sound model. They include 12 features based on collocation of amino acid pairs, focusing on amino acids (Gly, Asp, and Asn) that are known to be predisposed to form beta-turns. At the same time, our model also includes features that are geared towards exclusion of non-beta-turns, which are based on amino acids known to be strongly detrimental to formation of beta-turns (Met, Ile, Leu, and Val).


1969 ◽  
Vol 47 (12) ◽  
pp. 1857-1863 ◽  
Author(s):  
Oluf L. Gamborg ◽  
A. J. Finlayson

Amino acid analyses were performed on the soluble and total proteins from plant cells grown in suspension culture. The cell cultures originated from 12 different plant species, and the explants were taken from different organs of the plants.Relatively small differences in the amino acid composition existed between the soluble proteins from different species, between cells originating from different organs of the same species, and between the same cultures grown on different media under the same environmental conditions.There was some variation in the proportion of basic, aromatic, and sulfur-containing amino acids which constituted about 17%, 9.5%, and 3% of the protein amino acids, respectively. The amino acid composition of the soluble proteins of wheat coleoptile and soybean hypocotyl resembled that of the soluble proteins from cultured cells of these plants. Essential amino acids, particularly the basic ones and methionine, were proportionally higher in the cell proteins than those reported for seed proteins.


2019 ◽  
Vol 116 (39) ◽  
pp. 19274-19281 ◽  
Author(s):  
Baofu Qiao ◽  
Felipe Jiménez-Ángeles ◽  
Trung Dac Nguyen ◽  
Monica Olvera de la Cruz

The conformation of water around proteins is of paramount importance, as it determines protein interactions. Although the average water properties around the surface of proteins have been provided experimentally and computationally, protein surfaces are highly heterogeneous. Therefore, it is crucial to determine the correlations of water to the local distributions of polar and nonpolar protein surface domains to understand functions such as aggregation, mutations, and delivery. By using atomistic simulations, we investigate the orientation and dynamics of water molecules next to 4 types of protein surface domains: negatively charged, positively charged, and charge-neutral polar and nonpolar amino acids. The negatively charged amino acids orient around 98% of the neighboring water dipoles toward the protein surface, and such correlation persists up to around 16 Å from the protein surface. The positively charged amino acids orient around 94% of the nearest water dipoles against the protein surface, and the correlation persists up to around 12 Å. The charge-neutral polar and nonpolar amino acids are also orienting the water neighbors in a quantitatively weaker manner. A similar trend was observed in the residence time of the nearest water neighbors. These findings hold true for 3 technically important enzymes (PETase, cytochrome P450, and organophosphorus hydrolase). Our results demonstrate that the water−amino acid degree of correlation follows the same trend as the amino acid contribution in proteins solubility, namely, the negatively charged amino acids are the most beneficial for protein solubility, then the positively charged amino acids, and finally the charge-neutral amino acids.


Sign in / Sign up

Export Citation Format

Share Document