Pushing the limits of solubility prediction via quality-oriented data selection

Mapping Intimacies ◽

10.21203/rs.3.rs-84771/v1 ◽

2020 ◽

Author(s):

Murat Sorkun ◽

J. M. Koelman ◽

Süleyman Er

Keyword(s):

Prediction Models ◽

Aqueous Solubility ◽

Data Selection ◽

Data Driven ◽

Solubility Prediction ◽

Quality Of Data ◽

Statistical Validation ◽

Solubility Predictions ◽

Machine Learning Approach

Abstract Accurate prediction of the solubility of chemical substances in solvents remains a challenge. The sparsity of high-quality solubility data is recognized as the biggest hurdle in the development of robust data-driven methods for practical use. Nonetheless, the effects of the quality and quantity of data on aqueous solubility predictions have not yet been scrutinized. In this study, the roles of the size and the quality of datasets on the performances of the solubility prediction models are unraveled, and the concepts of actual and observed performances are introduced. In an effort to curtail the gap between actual and observed performances, a quality-oriented data selection method, which evaluates the quality of data and extracts the most accurate part of it through statistical validation, is designed. Applying this method on the largest publicly available solubility database and using a consensus machine learning approach, a top-performing solubility prediction model is achieved.

Get full-text (via PubEx)

The Impact of Data Quantity and Source on the Quality of Data-Driven Hints for Programming

Lecture Notes in Computer Science - Artificial Intelligence in Education ◽

10.1007/978-3-319-93843-1_35 ◽

2018 ◽

pp. 476-490 ◽

Cited By ~ 2

Author(s):

Thomas W. Price ◽

Rui Zhi ◽

Yihuan Dong ◽

Nicholas Lytle ◽

Tiffany Barnes

Keyword(s):

Data Driven ◽

Quality Of Data ◽

The Impact

Get full-text (via PubEx)

Optimizing the Playback Quality of Data-Driven Peer-to-Peer Streaming

2012 Fourth International Conference on Computational and Information Sciences ◽

10.1109/iccis.2012.200 ◽

2012 ◽

Author(s):

Guowei Huang ◽

Liang He

Keyword(s):

Peer To Peer ◽

Data Driven ◽

Quality Of Data

Get full-text (via PubEx)

Quality of data driven simulation workflows

2012 IEEE 8th International Conference on E-Science ◽

10.1109/escience.2012.6404417 ◽

2012 ◽

Cited By ~ 3

Author(s):

Michael Reiter ◽

Uwe Breitenbucher ◽

Oliver Kopp ◽

Dimka Karastoyanova

Keyword(s):

Data Driven ◽

Quality Of Data

Get full-text (via PubEx)

Quality of Data Driven Simulation Workflows

Journal of Systems Integration ◽

10.20470/jsi.v5i1.189 ◽

2014 ◽

pp. 3-29 ◽

Cited By ~ 1

Author(s):

Michael Reiter ◽

Uwe Breitenbucher ◽

Oliver Kopp ◽

Dimka Karastoyanova

Keyword(s):

Data Driven ◽

Quality Of Data

Get full-text (via PubEx)

Development and validation of multivariable prediction models of remission, recovery, and quality of life outcomes in people with first episode psychosis: a machine learning approach

The Lancet Digital Health ◽

10.1016/s2589-7500(19)30121-9 ◽

2019 ◽

Vol 1 (6) ◽

pp. e261-e270 ◽

Cited By ~ 8

Author(s):

Samuel P Leighton ◽

Rachel Upthegrove ◽

Rajeev Krishnadas ◽

Michael E Benros ◽

Matthew R Broome ◽

...

Keyword(s):

Quality Of Life ◽

Prediction Models ◽

First Episode Psychosis ◽

First Episode ◽

Learning Approach ◽

Life Outcomes ◽

Episode Psychosis ◽

Machine Learning Approach ◽

Development And Validation

Get full-text (via PubEx)

Introducing C-SPAN's Resources for Teaching

News for Teachers of Political Science ◽

10.1017/s0197901900000428 ◽

1987 ◽

Vol 54 ◽

pp. 3-3

Author(s):

Stephen E. Frantzich

Keyword(s):

Subject Matter ◽

Data Selection ◽

Quality Of Data ◽

Negative Side ◽

Traditional Course ◽

The Subject ◽

Written Sources

Integrating C-SPAN coverage into a traditional course provides some unique opportunities and burdens. On the opportunity side, the ability to see the subject matter relatively directly sparks interest, verifies class material and allows for some creative activities not possible using traditional resources. On the more negative side, the approaches outlined in this paper do not necessarily make teaching easier. Since faculty seldom have the opportunity to become C-SPAN “junkies” watching all the coverage, students will bring questions and examples to class which challenge the instructor more than the material stimulated by contact with traditional written sources. In evaluating many of the exercises, the instructor will have to rely on the student's interpretation and the quality of data selection and analysis. Grading will more often be based on how well the student makes his case, rather than the instructor knowing the contours of what the student should conclude ahead of time.

Get full-text (via PubEx)

A Comparison of the Quality of Data-Driven Programming Hint Generation Algorithms

International Journal of Artificial Intelligence in Education ◽

10.1007/s40593-019-00177-z ◽

2019 ◽

Vol 29 (3) ◽

pp. 368-395 ◽

Cited By ~ 3

Author(s):

Thomas W. Price ◽

Yihuan Dong ◽

Rui Zhi ◽

Benjamin Paaßen ◽

Nicholas Lytle ◽

...

Keyword(s):

Data Driven ◽

Quality Of Data

Get full-text (via PubEx)

Application on sensory prediction of Chinese Moutai-flavour liquor based on ATR-FTIR

E3S Web of Conferences ◽

10.1051/e3sconf/20197903001 ◽

2019 ◽

Vol 79 ◽

pp. 03001

Author(s):

Fan Wang ◽

Chunfu Shao ◽

Qi Chen ◽

Tianyi Meng ◽

Changwen Li

Keyword(s):

Sensory Quality ◽

Prediction Models ◽

Data Selection ◽

Quality Analysis ◽

Classification Models ◽

Verification Of Models ◽

Sensory Prediction ◽

Selection Of

ATR-FTIR combined with chemometrics was applied to establish SVM classification models aiming to evaluate sensory quality of Chinese Moutai-flavour liquor. Transformation of ATR-FTIR data, selection of effective wavenumbers as well as determination of c and gamma were performed in succession, while the verification of models was deployed applying unknown samples. Finally, taste-prediction models of raw grain and cleanliness have an accuracy reaching 90%. Model of after-taste has an accuracy of 80% and others are lower than 70%. As for some flavours, ATR-FTIR and chemometrics technology provided an effective method for quality analysis of Chinese Moutai-flavour liquor.

Get full-text (via PubEx)

ADME Prediction with KNIME: In silico aqueous solubility models based on supervised recursive machine learning approaches

ADMET & DMPK ◽

10.5599/admet.852 ◽

2020 ◽

Author(s):

Gabriela Falcón-Cano ◽

Christophe Molina ◽

Miguel Angel Cabrera-Pérez

Keyword(s):

Machine Learning ◽

Experimental Data ◽

In Silico ◽

Aqueous Solubility ◽

Learning Approaches ◽

Consensus Model ◽

Solubility Prediction ◽

Development Processes ◽

Pharmaceutical Molecules

In-silico prediction of aqueous solubility plays an important role during the drug discovery and development processes. For many years, the limited performance of in-silico solubility models has been attributed to the lack of high-quality solubility data for pharmaceutical molecules. However, some studies suggest that the poor accuracy of solubility prediction is not related to the quality of the experimental data and that more precise methodologies (algorithms and/or set of descriptors) are required for predicting aqueous solubility for pharmaceutical molecules. In this study a large and diverse database was generated with aqueous solubility values collected from two public sources; two new recursive machine-learning approaches were developed for data cleaning and variable selection, and a consensus model based on regression and classification algorithms was created. The modeling protocol, which includes the curation of chemical and experimental data, was implemented in KNIME, with the aim of obtaining an automated workflow for the prediction of new databases. Finally, we compared several methods or models available in the literature with our consensus model, showing results comparable or even outperforming previous published models.

Get full-text (via PubEx)

Methods to Monitor and Improve the Performance of Specimen Holders for Transmission Electron Cryomicroscopy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100164738 ◽

1996 ◽

Vol 54 ◽

pp. 454-455

Author(s):

B. L. Armbruster ◽

B. Kraus ◽

M. Pan

Keyword(s):

Electron Microscopy ◽

Electron Beam ◽

Electron Beam Irradiation ◽

Beam Irradiation ◽

Quality Of Data ◽

Electron Cryomicroscopy ◽

Thermal Equilibration ◽

Transmission Electron ◽

Electron Microscopes

One goal in electron microscopy of biological specimens is to improve the quality of data to equal the resolution capabilities of modem transmission electron microscopes. Radiation damage and beam- induced movement caused by charging of the sample, low image contrast at high resolution, and sensitivity to external vibration and drift in side entry specimen holders limit the effective resolution one can achieve. Several methods have been developed to address these limitations: cryomethods are widely employed to preserve and stabilize specimens against some of the adverse effects of the vacuum and electron beam irradiation, spot-scan imaging reduces charging and associated beam-induced movement, and energy-filtered imaging removes the “fog” caused by inelastic scattering of electrons which is particularly pronounced in thick specimens.Although most cryoholders can easily achieve a 3.4Å resolution specification, information perpendicular to the goniometer axis may be degraded due to vibration. Absolute drift after mechanical and thermal equilibration as well as drift after movement of a holder may cause loss of resolution in any direction.

Get full-text (via PubEx)