scholarly journals Uncertainty-Informed Deep Transfer Learning of PFAS Toxicity

Author(s):  
Jeremy Feinstein ◽  
ganesh sivaraman ◽  
Kurt Picel ◽  
Brian Peters ◽  
Alvaro Vazquez-Mayagoitia ◽  
...  

In this article, we present our recent study on computational methodology for predicting the toxicity of PFAS known as “forever chemicals” based on chemical structures through evaluation of multiple machine learning methods. To address the scarcity of PFAS toxicity data, a deep “transfer learning” method has been investigated by leveraging toxicity information over the entire organic chemical domain and an uncertainty-informed workflow by incorporating SelectiveNet architecture, which can support future guidance of high throughput screening with knowledge of chemical structures, has been developed.

2021 ◽  
Author(s):  
Jeremy Feinstein ◽  
ganesh sivaraman ◽  
Kurt Picel ◽  
Brian Peters ◽  
Alvaro Vazquez-Mayagoitia ◽  
...  

In this article, we present our recent study on computational methodology for predicting the toxicity of PFAS known as “forever chemicals” based on chemical structures through evaluation of multiple machine learning methods. To address the scarcity of PFAS toxicity data, a deep “transfer learning” method has been investigated by leveraging toxicity information over the entire organic chemical domain and an uncertainty-informed workflow by incorporating SelectiveNet architecture, which can support future guidance of high throughput screening with knowledge of chemical structures, has been developed.


Energies ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4595
Author(s):  
Parisa Asadi ◽  
Lauren E. Beckingham

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.


Author(s):  
Xabier Rodríguez-Martínez ◽  
Enrique Pascual-San-José ◽  
Mariano Campoy-Quiles

This review article presents the state-of-the-art in high-throughput computational and experimental screening routines with application in organic solar cells, including materials discovery, device optimization and machine-learning algorithms.


2019 ◽  
Vol 10 (36) ◽  
pp. 8374-8383 ◽  
Author(s):  
Mohammad Atif Faiz Afzal ◽  
Aditya Sonpal ◽  
Mojtaba Haghighatlari ◽  
Andrew J. Schultz ◽  
Johannes Hachmann

Computational pipeline for the accelerated discovery of organic materials with high refractive index via high-throughput screening and machine learning.


2014 ◽  
Vol 70 (a1) ◽  
pp. C708-C708
Author(s):  
Cho Yeow Koh ◽  
Jasmine Nguyen ◽  
Sayaka Shibata ◽  
Zhongsheng Zhang ◽  
Ranae Ranade ◽  
...  

Infection by the protozoan parasite Trypanosoma brucei causes human African trypanosomiasis, commonly known as sleeping sickness. The disease is fatal without treatment; yet, current therapeutic options for the disease are inadequate due to toxicity, difficulty in administration and emerging resistance. Therefore, methionyl-tRNA synthetase of T. brucei (TbMetRS) is targeted for the development of new antitrypanosomal drugs. We have recently completed a high-throughput screening campaign against TbMetRS using a 364,131 compounds library in The Scripps Research Institute Molecular Screening Center. Here we outline our strategy to integrate the power of crystal structures with high-throughput screening in a drug discovery project. We applied the rapid crystal soaking procedure to obtain structures of TbMetRS in complex with inhibitors reported earlier[1] to approximately 70 high-throughput screening hits. This resulted in more than 20 crystal structures of TbMetRS·hit complexes. These hits cover a large diversity of chemical structures with IC50 values between 200 nM and 10 µM. Based on the solved structures and existing knowledge drawn from other in-house inhibitors, the IC50 value of the most promising hit has been improved. Further development of the compounds into potent TbMetRS inhibitors with desirable pharmacokinetic properties is on-going and will continue to benefit from information derived from crystal structures.


2020 ◽  
Author(s):  
Xinhao Li ◽  
Denis Fourches

<p>Deep neural networks can directly learn from chemical structures without extensive, user-driven selection of descriptors in order to predict molecular properties/activities with high reliability. But these approaches typically require large training sets to learn the endpoint-specific structural features and ensure reasonable prediction accuracy. Even though large datasets are becoming the new normal in drug discovery, especially when it comes to high-throughput screening or metabolomics datasets, one should also consider smaller datasets with challenging endpoints to model and forecast. Thus, it would be highly relevant to better utilize the tremendous compendium of unlabeled compounds from publicly-available datasets for improving the model performances for the user’s particular series of compounds. In this study, we propose the <b>Mol</b>ecular <b>P</b>rediction <b>Mo</b>del <b>Fi</b>ne-<b>T</b>uning (<b>MolPMoFiT</b>) approach, an effective transfer learning method based on self-supervised pre-training + task-specific fine-tuning for QSPR/QSAR modeling. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manner, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with specific endpoints. Herein, the method is evaluated on four benchmark datasets (lipophilicity, FreeSolv, HIV, and blood-brain barrier penetration). The results showed the method can achieve strong performances for all four datasets compared to other state-of-the-art machine learning modeling techniques reported in the literature so far. <br></p>


2020 ◽  
Author(s):  
Xinzhe Zhu ◽  
Chi-Hung Ho ◽  
Xiaonan Wang

<p><a></a><a>The production process of many active pharmaceutical ingredients such as sitagliptin could cause severe environmental problems due to the use of toxic chemical materials and production infrastructure, energy consumption and wastes treatment. The environmental impacts of sitagliptin production process were estimated with life cycle assessment (LCA) method, which suggested that the use of chemical materials provided the major environmental impacts. Both methods of Eco-indicator 99 and ReCiPe endpoints confirmed that chemical feedstock accounted 83% and 70% of life-cycle impact, respectively. Among all the chemical materials used in the sitagliptin production process, </a><a>trifluoroacetic anhydride </a>was identified as the largest influential factor in most impact categories according to the results of ReCiPe midpoints method. Therefore, high-throughput screening was performed to seek for green chemical substitutes to replace the target chemical (i.e. trifluoroacetic anhydride) by the following three steps. Firstly, thirty most similar chemicals were obtained from two million candidate alternatives in PubChem database based on their molecular descriptors. Thereafter, deep learning neural network models were developed to predict life-cycle impact according to the chemicals in Ecoinvent v3.5 database with known LCA values and corresponding molecular descriptors. Finally, 1,2-ethanediyl ester was proved to be one of the potential greener substitutes after the LCA data of these similar chemicals were predicted using the well-trained machine learning models. The case study demonstrated the applicability of the novel framework to screen green chemical substitutes and optimize the pharmaceutical manufacturing process.</p>


2015 ◽  
Vol 11 (12) ◽  
pp. 3362-3377 ◽  
Author(s):  
Vinay Randhawa ◽  
Anil Kumar Singh ◽  
Vishal Acharya

Network-based and cheminformatics approaches identify novel lead molecules forCXCR4, a key gene prioritized in oral cancer.


Sign in / Sign up

Export Citation Format

Share Document