scholarly journals Exploring the octanol–water partition coefficient dataset using deep learning techniques and data augmentation

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Nadin Ulrich ◽  
Kai-Uwe Goss ◽  
Andrea Ebert

AbstractToday more and more data are freely available. Based on these big datasets deep neural networks (DNNs) rapidly gain relevance in computational chemistry. Here, we explore the potential of DNNs to predict chemical properties from chemical structures. We have selected the octanol-water partition coefficient (log P) as an example, which plays an essential role in environmental chemistry and toxicology but also in chemical analysis. The predictive performance of the developed DNN is good with an rmse of 0.47 log units in the test dataset and an rmse of 0.33 for an external dataset from the SAMPL6 challenge. To this end, we trained the DNN using data augmentation considering all potential tautomeric forms of the chemicals. We further demonstrate how DNN models can help in the curation of the log P dataset by identifying potential errors, and address limitations of the dataset itself.

2018 ◽  
Vol 8 (12) ◽  
pp. 2512 ◽  
Author(s):  
Ghouthi Boukli Hacene ◽  
Vincent Gripon ◽  
Nicolas Farrugia ◽  
Matthieu Arzel ◽  
Michel Jezequel

Deep learning-based methods have reached state of the art performances, relying on a large quantity of available data and computational power. Such methods still remain highly inappropriate when facing a major open machine learning problem, which consists of learning incrementally new classes and examples over time. Combining the outstanding performances of Deep Neural Networks (DNNs) with the flexibility of incremental learning techniques is a promising venue of research. In this contribution, we introduce Transfer Incremental Learning using Data Augmentation (TILDA). TILDA is based on pre-trained DNNs as feature extractors, robust selection of feature vectors in subspaces using a nearest-class-mean based technique, majority votes and data augmentation at both the training and the prediction stages. Experiments on challenging vision datasets demonstrate the ability of the proposed method for low complexity incremental learning, while achieving significantly better accuracy than existing incremental counterparts.


2011 ◽  
Vol 356-360 ◽  
pp. 83-88 ◽  
Author(s):  
Shu Qiao ◽  
Kun Xie ◽  
Chuan Fu ◽  
Jie Pan

Polychlorinated dibenzo-p-dioxins and dibenzofurans (PCDD/Fs) are a group of important persistent organic pollutants. Quantitative structure–property relationship (QSPR) modeling is a powerful approach for predicting the properties of environmental organic pollutants from their structure descriptors. In this study, a QSPR model is established for estimating n-octanol/water partition coefficient (log KOW) of PCDD/Fs. Three-dimensional holographic vector of atomic interaction field (3D-HoVAIF) is used to describe the chemical structures, SMR-PLS QSAR model has been created and good correlation coefficients and cross-validated correlation coefficient is obtained. Predictive capability of the models has also been demonstrated by leave-one-out cross-validation. Moreover, the estimated values have been presented for those PCDD/Fs which are lack of experimentally data by the optimum model.


2019 ◽  
Author(s):  
Lucas Ribeiro De Abreu ◽  
Reinaldo Augusto da Costa Bianchi

The RoboCup Soccer is one of the largest competitions in the robotics field of research. It considers the soccer match as a challenge for the robots and aims to win a match between humans versus robots by the year of 2050. The vision module is a critical system for the robots because it needs to quickly locate and classify objects of interest for the robot in order to generate the next best action. In this paper, an approach using Convolutional Neural Networks for object detection is described. The soccer ball is the chosen object and three state-ofart convolutional neural networks architectures were trained for the experiment using data augmentation and transfer learning techniques. The models were evaluated in a test set, yielding promising results in precision and frames per second. The best model achieved an average precision of 0.972 with an intersection over union of 50% and 9.64 frames per second, running on CPU.


2019 ◽  
Author(s):  
Floriane Montanari ◽  
Lara Kuhnke ◽  
Antonius ter Laak ◽  
Djork-Arné Clevert

Simple physico-chemical properties like logD, solubility or serum albumin binding have a direct impact on the likelihood of success of compounds in clinical trials. Here, we collected all the Bayer in house data related to these properties and applied machine learning techniques to predict them for new compounds. We report that, for the endpoints studied here, a multitask graph convolutional network appears a highly competitive choice. The new model shows increased predictive performance on all endpoints compared to previous modeling methods.<br>


2019 ◽  
Author(s):  
Floriane Montanari ◽  
Lara Kuhnke ◽  
Antonius ter Laak ◽  
Djork-Arné Clevert

Simple physico-chemical properties like logD, solubility or serum albumin binding have a direct impact on the likelihood of success of compounds in clinical trials. Here, we collected all the Bayer in house data related to these properties and applied machine learning techniques to predict them for new compounds. We report that, for the endpoints studied here, a multitask graph convolutional network appears a highly competitive choice. The new model shows increased predictive performance on all endpoints compared to previous modeling methods.<br>


2021 ◽  
Author(s):  
Massimiliano Greco ◽  
Giovanni Angelotti ◽  
Pier Francesco Caruso ◽  
Alberto Zanella ◽  
Niccolò Stomeo ◽  
...  

Abstract Introduction: SARS-CoV-2 infection was first identified at the end of 2019 in China, and subsequently spread globally. COVID-19 disease frequently affects the lungs leading to bilateral viral pneumonia, progressing in some cases to severe respiratory failure requiring ICU admission and mechanical ventilation. Risk stratification at ICU admission is fundamental for resource allocation and decision making, considering that baseline comorbidities, age, and patient conditions at admission have been associated to poorer outcomes. Supervised machine learning techniques are increasingly diffuse in clinical medicine and can predict mortality and test associations reaching high predictive performance. We assessed performances of a machine learning approach to predict mortality in COVID-19 patients admitted to ICU using data from the Lombardy ICU Network.Methods: this is a secondary analysis of prospectively collected data from Lombardy ICU network. To predict survival at 7-,14- and 28 days we built two different models; model A included patient demographics, medications before admission and comorbidities, while model B also included the data of the first day since ICU admission. 10-fold cross validation was repeated 2500 times, to ensure optimal hyperparameter choice. The only constrain imposed to model optimization was the choice of logistic regression as final layer to increase clinical interpretability. Different imputation and over-sampling techniques were employed in model training.Results 1503 patients were included, with 766 deaths (51%). Exploratory analysis and Kaplan-Meier curves demonstrated mortality association with age and gender. Model A and B reached the greatest predictive performance at 28 days (AUC 0.77 and 0.79), with lower performance at 14 days (AUC 0.72 and 0.74) and 7 days (AUC 0.68 and 0.71). Male gender, age and number of comorbidities were strongly associated with mortality in both models. Among comorbidities, chronic kidney disease and chronic obstructive pulmonary disease demonstrated association. Mode of ventilatory assistance at ICU admission and Fraction of Inspired oxygen were associated with mortality in model B.Conclusions Supervised machine learning models demonstrated good performance in prediction of 28-day mortality. 7-days and 14-days predictions demonstrated lower performance. Machine learning techniques may be useful in emergency phases to reach higher predictive performance with reduced human supervision using complex data.


2019 ◽  
Vol 35 (14) ◽  
pp. i164-i172 ◽  
Author(s):  
Dai Hai Nguyen ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka

Abstract Motivation Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning-based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: (i) predicting fingerprints from spectra; (ii) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant to the task, causing limited predictive performance and slow prediction. Results We propose ADAPTIVE, which has two parts: learning two mappings (i) from structures to molecular vectors and (ii) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing neural network, and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion. Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR), the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency. Availability and implementation The code will be accessed through http://www.bic.kyoto-u.ac.jp/pathway/tools/ADAPTIVE after the acceptance of this article.


2021 ◽  
Author(s):  
◽  
Lucas Ribeiro de Abreu

The RoboCup Soccer is one of the largest initiatives in the robotics field of research. This initiative considers the soccer match as a challenge for the robots and aims to win a match between humans versus robots by the year of 2050. The vision module is a critical system for the robots because it needs to quickly locate and classify objects of interest for the robot in order to generate the next best action. This work evaluates deep neural networks for the detection of the ball and robots. For such task, five convolutional neural networks architectures were trained for the experiment using data augmentation and transfer learning techniques. The models were evaluated in a test set, yielding promising results in precision and frames per second. The best model achieved an mAP of 0.98 and 14.7 frames per second, running on CPU


Molecules ◽  
2019 ◽  
Vol 25 (1) ◽  
pp. 44 ◽  
Author(s):  
Floriane Montanari ◽  
Lara Kuhnke ◽  
Antonius Ter Laak ◽  
Djork-Arné Clevert

Simple physico-chemical properties, like logD, solubility, or melting point, can reveal a great deal about how a compound under development might later behave. These data are typically measured for most compounds in drug discovery projects in a medium throughput fashion. Collecting and assembling all the Bayer in-house data related to these properties allowed us to apply powerful machine learning techniques to predict the outcome of those assays for new compounds. In this paper, we report our finding that, especially for predicting physicochemical ADMET endpoints, a multitask graph convolutional approach appears a highly competitive choice. For seven endpoints of interest, we compared the performance of that approach to fully connected neural networks and different single task models. The new model shows increased predictive performance compared to previous modeling methods and will allow early prioritization of compounds even before they are synthesized. In addition, our model follows the generalized solubility equation without being explicitly trained under this constraint.


Sign in / Sign up

Export Citation Format

Share Document