solubility prediction
Recently Published Documents


TOTAL DOCUMENTS

188
(FIVE YEARS 59)

H-INDEX

29
(FIVE YEARS 4)

2021 ◽  
Vol 22 (24) ◽  
pp. 13555
Author(s):  
Mohammad Madani ◽  
Kaixiang Lin ◽  
Anna Tarakanova

Protein solubility is an important thermodynamic parameter that is critical for the characterization of a protein’s function, and a key determinant for the production yield of a protein in both the research setting and within industrial (e.g., pharmaceutical) applications. Experimental approaches to predict protein solubility are costly, time-consuming, and frequently offer only low success rates. To reduce cost and expedite the development of therapeutic and industrially relevant proteins, a highly accurate computational tool for predicting protein solubility from protein sequence is sought. While a number of in silico prediction tools exist, they suffer from relatively low prediction accuracy, bias toward the soluble proteins, and limited applicability for various classes of proteins. In this study, we developed a novel deep learning sequence-based solubility predictor, DSResSol, that takes advantage of the integration of squeeze excitation residual networks with dilated convolutional neural networks and outperforms all existing protein solubility prediction models. This model captures the frequently occurring amino acid k-mers and their local and global interactions and highlights the importance of identifying long-range interaction information between amino acid k-mers to achieve improved accuracy, using only protein sequence as input. DSResSol outperforms all available sequence-based solubility predictors by at least 5% in terms of accuracy when evaluated by two different independent test sets. Compared to existing predictors, DSResSol not only reduces prediction bias for insoluble proteins but also predicts soluble proteins within the test sets with an accuracy that is at least 13% higher than existing models. We derive the key amino acids, dipeptides, and tripeptides contributing to protein solubility, identifying glutamic acid and serine as critical amino acids for protein solubility prediction. Overall, DSResSol can be used for the fast, reliable, and inexpensive prediction of a protein’s solubility to guide experimental design.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Zhuyifan Ye ◽  
Defang Ouyang

AbstractRapid solvent selection is of great significance in chemistry. However, solubility prediction remains a crucial challenge. This study aimed to develop machine learning models that can accurately predict compound solubility in organic solvents. A dataset containing 5081 experimental temperature and solubility data of compounds in organic solvents was extracted and standardized. Molecular fingerprints were selected to characterize structural features. lightGBM was compared with deep learning and traditional machine learning (PLS, Ridge regression, kNN, DT, ET, RF, SVM) to develop models for predicting solubility in organic solvents at different temperatures. Compared to other models, lightGBM exhibited significantly better overall generalization (logS  ± 0.20). For unseen solutes, our model gave a prediction accuracy (logS  ± 0.59) close to the expected noise level of experimental solubility data. lightGBM revealed the physicochemical relationship between solubility and structural features. Our method enables rapid solvent screening in chemistry and may be applied to solubility prediction in other solvents.


Molecules ◽  
2021 ◽  
Vol 26 (20) ◽  
pp. 6185
Author(s):  
Oliver Wieder ◽  
Mélaine Kuenemann ◽  
Marcus Wieder ◽  
Thomas Seidel ◽  
Christophe Meyer ◽  
...  

The accurate prediction of molecular properties, such as lipophilicity and aqueous solubility, are of great importance and pose challenges in several stages of the drug discovery pipeline. Machine learning methods, such as graph-based neural networks (GNNs), have shown exceptionally good performance in predicting these properties. In this work, we introduce a novel GNN architecture, called directed edge graph isomorphism network (D-GIN). It is composed of two distinct sub-architectures (D-MPNN, GIN) and achieves an improvement in accuracy over its sub-architectures employing various learning, and featurization strategies. We argue that combining models with different key aspects help make graph neural networks deeper and simultaneously increase their predictive power. Furthermore, we address current limitations in assessment of deep-learning models, namely, comparison of single training run performance metrics, and offer a more robust solution.


Author(s):  
Run Guo ◽  
Xiang Bai ◽  
Yang Lu ◽  
Lin-Zhou Zhang ◽  
Xing-Ying Lan ◽  
...  

2021 ◽  
Vol 66 (10) ◽  
pp. 1549-1553
Author(s):  
Yubo Xing ◽  
Zhigan Deng ◽  
Fuxian Yang ◽  
Chang Wei ◽  
Xingbin Li ◽  
...  

2021 ◽  
Author(s):  
Wu Han Toh ◽  
Chuang-Wei Wang ◽  
Wen-Hung Chung

Background: Common warts and flat warts are caused by the human papillomavirus (HPV). Peak incidence of wart infection occurs in schoolchildren aged 12-16, where prevalence can be as high as 20%. Traditional treatments aimed at destruction of wart tissue have low clearance rates and high recurrence rates. Occasional reports have even shown warts becoming malignant and progressing into verrucous carcinoma. Current licensed HPV vaccines largely target higher-risk oncogenic HPV types, but do not provide coverage of low-risk types associated with warts. To date, little attention has been given to the development of effective, anti-viral wart treatments. Objective: This study aims to identify immunodominant T-lymphocyte epitopes from the L1 major capsid protein of HPV 1, 2 and 3, a foundational step in bioengineering a peptide-based vaccine for warts. Methods: Cytotoxic T-cell and helper T-cell epitopes were predicted using an array of immunoinformatic tools against a reference panel of frequently observed MHC-I and MHC-II alleles. Predicted peptides were ranked based on IC50 and IFN-γ Inducer Scores, respectively, and top performing epitopes were synthesized and subjected to in vitro screening by IFN-γ enzyme-linked immunosorbent spot assay (ELISpot). Independent trials were conducted using PBMCs of healthy volunteers. Final chosen peptides were fused with flexible GS linkers in silico to design a novel polypeptide vaccine. Results: Seven immunodominant peptides screened from 44 predicted peptides were included in the vaccine design, selected to elicit specific immune responses across MHC class I and class II, and across HPV types. Evaluation of the vaccine′s properties suggest that the vaccine is stable, non-allergenic, and provides near complete global population coverage (>99%). Solubility prediction and rare codon analysis indicate that the DNA sequence encoding the vaccine is suitable for high level expression in Escherichia coli. Conclusions: In sum, this study demonstrates the potential and lays the framework for the development of a peptide-based vaccine against warts.


Sign in / Sign up

Export Citation Format

Share Document