Uncertainty-Informed Deep Transfer Learning of PFAS Toxicity

In this article, we present our recent study on computational methodology for predicting the toxicity of PFAS known as “forever chemicals” based on chemical structures through evaluation of multiple machine learning methods. To address the scarcity of PFAS toxicity data, a deep “transfer learning” method has been investigated by leveraging toxicity information over the entire organic chemical domain and an uncertainty-informed workflow by incorporating SelectiveNet architecture, which can support future guidance of high throughput screening with knowledge of chemical structures, has been developed.

Download Full-text

Using Machine Learning Methods to Predict Experimental High Throughput Screening Data

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/138620710791292958 ◽

2010 ◽

Vol 13 (5) ◽

pp. 430-441 ◽

Cited By ~ 5

Author(s):

Cherif Mballo ◽

Vladimir Makarenkov

Keyword(s):

Machine Learning ◽

High Throughput ◽

High Throughput Screening ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Virtual High Throughput Screening Using Machine Learning Methods

Studies in Classification, Data Analysis, and Knowledge Organization - Classification as a Tool for Research ◽

10.1007/978-3-642-10745-0_56 ◽

2010 ◽

pp. 517-524

Author(s):

Cherif Mballo ◽

Vladimir Makarenkov

Keyword(s):

Machine Learning ◽

High Throughput ◽

High Throughput Screening ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Integrating Machine/Deep Learning Methods and Filtering Techniques for Reliable Mineral Phase Segmentation of 3D X-ray Computed Tomography Images

Energies ◽

10.3390/en14154595 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4595

Author(s):

Parisa Asadi ◽

Lauren E. Beckingham

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ct Images ◽

Ct Imaging ◽

Learning Method ◽

Learning Methods ◽

X Ray ◽

Machine Learning Methods ◽

Filtering Techniques

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.

Download Full-text

Accelerating organic solar cell material's discovery: high-throughput screening and big data

Energy & Environmental Science ◽

10.1039/d1ee00559f ◽

2021 ◽

Author(s):

Xabier Rodríguez-Martínez ◽

Enrique Pascual-San-José ◽

Mariano Campoy-Quiles

Keyword(s):

Machine Learning ◽

Big Data ◽

High Throughput ◽

Organic Solar Cells ◽

High Throughput Screening ◽

Organic Solar Cell ◽

State Of The Art ◽

Review Article ◽

Machine Learning Algorithms ◽

Device Optimization

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.

Download Full-text

A deep neural network model for packing density predictions and its application in the study of 1.5 million organic molecules

Chemical Science ◽

10.1039/c9sc02677k ◽

2019 ◽

Vol 10 (36) ◽

pp. 8374-8383 ◽

Cited By ~ 1

Author(s):

Mohammad Atif Faiz Afzal ◽

Aditya Sonpal ◽

Mojtaba Haghighatlari ◽

Andrew J. Schultz ◽

Johannes Hachmann

Keyword(s):

Neural Network ◽

Machine Learning ◽

Refractive Index ◽

High Throughput ◽

Neural Network Model ◽

High Throughput Screening ◽

Deep Neural Network ◽

Organic Molecules ◽

High Refractive Index ◽

Computational Pipeline

Computational pipeline for the accelerated discovery of organic materials with high refractive index via high-throughput screening and machine learning.

Download Full-text

Structure-based drug design against Trypanosoma brucei methionyl-tRNA synthetase

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314092912 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C708-C708

Author(s):

Cho Yeow Koh ◽

Jasmine Nguyen ◽

Sayaka Shibata ◽

Zhongsheng Zhang ◽

Ranae Ranade ◽

...

Keyword(s):

Crystal Structures ◽

Trypanosoma Brucei ◽

High Throughput ◽

High Throughput Screening ◽

Protozoan Parasite ◽

Trna Synthetase ◽

Structure Based Drug Design ◽

Chemical Structures ◽

Ic50 Values ◽

Methionyl Trna Synthetase

Infection by the protozoan parasite Trypanosoma brucei causes human African trypanosomiasis, commonly known as sleeping sickness. The disease is fatal without treatment; yet, current therapeutic options for the disease are inadequate due to toxicity, difficulty in administration and emerging resistance. Therefore, methionyl-tRNA synthetase of T. brucei (TbMetRS) is targeted for the development of new antitrypanosomal drugs. We have recently completed a high-throughput screening campaign against TbMetRS using a 364,131 compounds library in The Scripps Research Institute Molecular Screening Center. Here we outline our strategy to integrate the power of crystal structures with high-throughput screening in a drug discovery project. We applied the rapid crystal soaking procedure to obtain structures of TbMetRS in complex with inhibitors reported earlier[1] to approximately 70 high-throughput screening hits. This resulted in more than 20 crystal structures of TbMetRS·hit complexes. These hits cover a large diversity of chemical structures with IC50 values between 200 nM and 10 µM. Based on the solved structures and existing knowledge drawn from other in-house inhibitors, the IC50 value of the most promising hit has been improved. Further development of the compounds into potent TbMetRS inhibitors with desirable pharmacokinetic properties is on-going and will continue to benefit from information derived from crystal structures.

Download Full-text

Inductive Transfer Learning for Molecular Activity Prediction: Next-Gen QSAR Models with MolPMoFiT

10.26434/chemrxiv.9978743.v2 ◽

2020 ◽

Author(s):

Xinhao Li ◽

Denis Fourches

Keyword(s):

Transfer Learning ◽

High Throughput Screening ◽

Structure Prediction ◽

Large Scale ◽

High Reliability ◽

Structural Features ◽

Fine Tuning ◽

Qsar Modeling ◽

Chemical Structures ◽

Effective Transfer

Deep neural networks can directly learn from chemical structures without extensive, user-driven selection of descriptors in order to predict molecular properties/activities with high reliability. But these approaches typically require large training sets to learn the endpoint-specific structural features and ensure reasonable prediction accuracy. Even though large datasets are becoming the new normal in drug discovery, especially when it comes to high-throughput screening or metabolomics datasets, one should also consider smaller datasets with challenging endpoints to model and forecast. Thus, it would be highly relevant to better utilize the tremendous compendium of unlabeled compounds from publicly-available datasets for improving the model performances for the user’s particular series of compounds. In this study, we propose the Molecular Prediction Model Fine-Tuning (MolPMoFiT) approach, an effective transfer learning method based on self-supervised pre-training + task-specific fine-tuning for QSPR/QSAR modeling. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manner, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with specific endpoints. Herein, the method is evaluated on four benchmark datasets (lipophilicity, FreeSolv, HIV, and blood-brain barrier penetration). The results showed the method can achieve strong performances for all four datasets compared to other state-of-the-art machine learning modeling techniques reported in the literature so far.

Download Full-text

Application of Life Cycle Assessment and Machine Learning for High-Throughput Screening of Green Chemical Substitutes

10.26434/chemrxiv.12210860.v1 ◽

2020 ◽

Author(s):

Xinzhe Zhu ◽

Chi-Hung Ho ◽

Xiaonan Wang

Keyword(s):

Machine Learning ◽

Life Cycle Assessment ◽

Life Cycle ◽

Production Process ◽

Environmental Impacts ◽

High Throughput ◽

High Throughput Screening ◽

Molecular Descriptors ◽

Trifluoroacetic Anhydride ◽

Chemical Materials

<a></a><a>The production process of many active pharmaceutical ingredients such as sitagliptin could cause severe environmental problems due to the use of toxic chemical materials and production infrastructure, energy consumption and wastes treatment. The environmental impacts of sitagliptin production process were estimated with life cycle assessment (LCA) method, which suggested that the use of chemical materials provided the major environmental impacts. Both methods of Eco-indicator 99 and ReCiPe endpoints confirmed that chemical feedstock accounted 83% and 70% of life-cycle impact, respectively. Among all the chemical materials used in the sitagliptin production process, </a><a>trifluoroacetic anhydride </a>was identified as the largest influential factor in most impact categories according to the results of ReCiPe midpoints method. Therefore, high-throughput screening was performed to seek for green chemical substitutes to replace the target chemical (i.e. trifluoroacetic anhydride) by the following three steps. Firstly, thirty most similar chemicals were obtained from two million candidate alternatives in PubChem database based on their molecular descriptors. Thereafter, deep learning neural network models were developed to predict life-cycle impact according to the chemicals in Ecoinvent v3.5 database with known LCA values and corresponding molecular descriptors. Finally, 1,2-ethanediyl ester was proved to be one of the potential greener substitutes after the LCA data of these similar chemicals were predicted using the well-trained machine learning models. The case study demonstrated the applicability of the novel framework to screen green chemical substitutes and optimize the pharmaceutical manufacturing process.

Download Full-text