Computational aqueous solubility prediction for drug-like compounds in congeneric series

2008 ◽  
Vol 43 (3) ◽  
pp. 501-512 ◽  
Author(s):  
Lei Du-Cuny ◽  
Jörg Huwyler ◽  
Michael Wiese ◽  
Manfred Kansy
ChemInform ◽  
2004 ◽  
Vol 35 (39) ◽  
Author(s):  
Christel A. S. Bergstroem ◽  
Carola M. Wassvik ◽  
Ulf Norinder ◽  
Kristina Luthman ◽  
Per Artusson

2020 ◽  
Author(s):  
Murat Sorkun ◽  
J. M. Koelman ◽  
Süleyman Er

Abstract Accurate prediction of the solubility of chemical substances in solvents remains a challenge. The sparsity of high-quality solubility data is recognized as the biggest hurdle in the development of robust data-driven methods for practical use. Nonetheless, the effects of the quality and quantity of data on aqueous solubility predictions have not yet been scrutinized. In this study, the roles of the size and the quality of datasets on the performances of the solubility prediction models are unraveled, and the concepts of actual and observed performances are introduced. In an effort to curtail the gap between actual and observed performances, a quality-oriented data selection method, which evaluates the quality of data and extracts the most accurate part of it through statistical validation, is designed. Applying this method on the largest publicly available solubility database and using a consensus machine learning approach, a top-performing solubility prediction model is achieved.


2021 ◽  
Author(s):  
Elif Sorkun ◽  
Qi Zhang ◽  
Abhishek Khetan ◽  
murat cihan sorkun ◽  
Süleyman Er

An increasing number of electroactive compounds have recently been explored for their use in high-performance redox flow batteries for grid-scale energy storage. Given the vast and highly diverse chemical space of the candidate compounds, it is alluring to access their physicochemical properties in a speedy way. High-throughput virtual screening approaches, which use powerful combinatorial techniques for systematic enumerations of large virtual chemical libraries and respective property evaluations, are indispensable tools for an agile exploration of the designated chemical space. Herein, RedDB: a computational database that contains 31,677 molecules from two prominent classes of organic electroactive compounds, quinones and aza-aromatics, has been presented. RedDB incorporates miscellaneous physicochemical property information of the compounds that can potentially be employed as battery performance descriptors. RedDB’s development steps, including: i)chemical library generation, ii) molecular property prediction based on quantum chemical calculations, iii) aqueous solubility prediction using machine learning, and iv) data processing and database creation, have been described.


Author(s):  
Jen-Hao Chen ◽  
Yufeng Jane Tseng

Abstract Aqueous solubility is the key property driving many chemical and biological phenomena and impacts experimental and computational attempts to assess those phenomena. Accurate prediction of solubility is essential and challenging, even with modern computational algorithms. Fingerprint-based, feature-based and molecular graph-based representations have all been used with different deep learning methods for aqueous solubility prediction. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. In this work, we reviewed different representations and also focused on using graph and line notations for modeling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular-input line-entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN.


2004 ◽  
Vol 44 (4) ◽  
pp. 1477-1488 ◽  
Author(s):  
Christel A. S. Bergström ◽  
Carola M. Wassvik ◽  
Ulf Norinder ◽  
Kristina Luthman ◽  
Per Artursson

2007 ◽  
Vol 4 (4) ◽  
pp. 489-497 ◽  
Author(s):  
Hongzhou Zhang ◽  
Howard Y. Ando ◽  
Linna Chen ◽  
Pil H. Lee

Sign in / Sign up

Export Citation Format

Share Document