scholarly journals Iterative Experimental Design Based on Active Machine Learning Reduces the Experimental Burden Associated with Reaction Screening

Author(s):  
Natalie Eyke ◽  
William H. Green ◽  
Klavs F. Jensen

High-throughput reaction screening has emerged as a useful means of rapidly identifying the influence of key reaction variables on reaction outcomes. We show that active machine learning can further this objective by eliminating dependence on complete screens through iterative selection of maximally informative experiments from the subset of all possible experiments in the domain. To demonstrate our approach, we conduct retrospective analyses of the preexisting results of high-throughput reaction screening experiments. We compare the test set errors of models trained on actively-selected reactions to models trained on reactions selected at random from the same domain. We find that the degree to which models trained on actively-selected data outperform models trained on randomly-selected data depends on the domain being modeled, with it being possible to achieve very low test set errors when the dataset is heavily skewed in favor of low- or zero-yielding reactions. Our results confirm that the active learning algorithm is a useful experiment planning tool that can change the reaction screening paradigm, by allowing discovery and process chemists to focus their reaction screening efforts on the generation of a small amount of high-quality data.

2020 ◽  
Author(s):  
Natalie Eyke ◽  
William H. Green ◽  
Klavs F. Jensen

High-throughput reaction screening has emerged as a useful means of rapidly identifying the influence of key reaction variables on reaction outcomes. We show that active machine learning can further this objective by eliminating dependence on complete screens through iterative selection of maximally informative experiments from the subset of all possible experiments in the domain. To demonstrate our approach, we conduct retrospective analyses of the preexisting results of high-throughput reaction screening experiments. We compare the test set errors of models trained on actively-selected reactions to models trained on reactions selected at random from the same domain. We find that the degree to which models trained on actively-selected data outperform models trained on randomly-selected data depends on the domain being modeled, with it being possible to achieve very low test set errors when the dataset is heavily skewed in favor of low- or zero-yielding reactions. Our results confirm that the active learning algorithm is a useful experiment planning tool that can change the reaction screening paradigm, by allowing discovery and process chemists to focus their reaction screening efforts on the generation of a small amount of high-quality data.


Author(s):  
W. H. Marshall ◽  
G. Sierp ◽  
T. Barbalho ◽  
F. T. Christiansen ◽  
S. Scholz ◽  
...  

Chemosphere ◽  
2019 ◽  
Vol 234 ◽  
pp. 242-251 ◽  
Author(s):  
Mikaël Kedzierski ◽  
Mathilde Falcou-Préfol ◽  
Marie Emmanuelle Kerros ◽  
Maryvonne Henry ◽  
Maria Luiza Pedrotti ◽  
...  

2019 ◽  
Vol 2019 ◽  
pp. 1-14
Author(s):  
Shilun Yang ◽  
Yanjia Shen ◽  
Wendan Lu ◽  
Yinglin Yang ◽  
Haigang Wang ◽  
...  

Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.


Molecules ◽  
2020 ◽  
Vol 25 (6) ◽  
pp. 1452
Author(s):  
Igor Sieradzki ◽  
Damian Leśniak ◽  
Sabina Podlewska

A great variety of computational approaches support drug design processes, helping in selection of new potentially active compounds, and optimization of their physicochemical and ADMET properties. Machine learning is a group of methods that are able to evaluate in relatively short time enormous amounts of data. However, the quality of machine-learning-based prediction depends on the data supplied for model training. In this study, we used deep neural networks for the task of compound activity prediction and developed dropout-based approaches for estimating prediction uncertainty. Several types of analyses were performed: the relationships between the prediction error, similarity to the training set, prediction uncertainty, number and standard deviation of activity values were examined. It was tested whether incorporation of information about prediction uncertainty influences compounds ranking based on predicted activity and prediction uncertainty was used to search for the potential errors in the ChEMBL database. The obtained outcome indicates that incorporation of information about uncertainty of compound activity prediction can be of great help during virtual screening experiments.


2015 ◽  
Vol 71 (5) ◽  
pp. 1059-1067 ◽  
Author(s):  
Markus-Frederik Bohn ◽  
Celia A. Schiffer

High-throughput crystallographic approaches require integrated software solutions to minimize the need for manual effort.REdiiiis a system that allows fully automated crystallographic structure solution by integrating existing crystallographic software into an adaptive and partly autonomous workflow engine. The program can be initiated after collecting the first frame of diffraction data and is able to perform processing, molecular-replacement phasing, chain tracing, ligand fitting and refinement without further user intervention. Preset values for each software component allow efficient progress with high-quality data and known parameters. The adaptive workflow engine can determine whether some parameters require modifications and choose alternative software strategies in case the preconfigured solution is inadequate. This integrated pipeline is targeted at providing a comprehensive and efficient approach to screening for ligand-bound co-crystal structures while minimizing repetitiveness and allowing a high-throughput scientific discovery process.


2018 ◽  
Author(s):  
Francisco R. Fields ◽  
Stephan D. Freed ◽  
Katelyn E. Carothers ◽  
Md Nafiz Hamid ◽  
Daniel E. Hammers ◽  
...  

AbstractBacteriocins are ribosomally produced antimicrobial peptides that represent an untapped source of promising antibiotic alternatives. However, inherent challenges in isolation and identification of natural bacteriocins in substantial yield have limited their potential use as viable antimicrobial compounds. In this study, we have developed an overall pipeline for bacteriocin-derived compound design and testing that combines sequence-free prediction of bacteriocins using a machine-learning algorithm and a simple biophysical trait filter to generate minimal 20 amino acid peptide candidates that can be readily synthesized and evaluated for activity. We generated 28,895 total 20-mer peptides and scored them for charge, α-helicity, and hydrophobic moment, allowing us to identify putative peptide sequences with the highest potential for interaction and activity against bacterial membranes. Of those, we selected sixteen sequences for synthesis and further study, and evaluated their antimicrobial, cytotoxicity, and hemolytic activities. We show that bacteriocin-based peptides with the overall highest scores for our biophysical parameters exhibited significant antimicrobial activity against E. coli and P. aeruginosa. Our combined method incorporates machine learning and biophysical-based minimal region determination, to create an original approach to rapidly discover novel bacteriocin candidates amenable to rapid synthesis and evaluation for therapeutic use.


2020 ◽  
Author(s):  
Alan Mejia Maza ◽  
Seth Jarvis ◽  
Weaverly Colleen Lee ◽  
Thomas J. Cunningham ◽  
Giampietro Schiavo ◽  
...  

AbstractThe neuromuscular junction (NMJ) is the peripheral synapse formed between a motor neuron axon terminal and a muscle fibre. NMJs are thought to be the primary site of peripheral pathology in many neuromuscular diseases, but innervation/denervation status is often assessed qualitatively with poor systematic criteria across studies, and separately from 3D morphological structure. Here, we describe the development of ‘NMJ-Analyser’, to comprehensively screen the morphology of NMJs and their corresponding innervation status automatically. NMJ-Analyser generates 29 biologically relevant features to quantitatively define healthy and aberrant neuromuscular synapses and applies machine learning to diagnose NMJ degeneration. We validated this framework in longitudinal analyses of wildtype mice, as well as in four different neuromuscular disease models: three for amyotrophic lateral sclerosis (ALS) and one for peripheral neuropathy. We showed that structural changes at the NMJ initially occur in the nerve terminal of mutant TDP43 and FUS ALS models. Using a machine learning algorithm, healthy and aberrant neuromuscular synapses are identified with 95% accuracy, with 88% sensitivity and 97% specificity. Our results validate NMJ-Analyser as a robust platform for systematic and structural screening of NMJs, and pave the way for transferrable, and cross-comparison and high-throughput studies in neuromuscular diseases.


2020 ◽  
Vol 22 ◽  
pp. 145-160
Author(s):  
Darío Tilves Santiago ◽  
Carmén García Mateo ◽  
Soledad Torres Guijarro ◽  
Laura Docío Fernández ◽  
José Luis Alba Castro

Automatic sign language recognition (ASLR) is quite a complex task, not only for the difficulty of dealing with very dynamic video information, but also because almost every sign language (SL) can be considered as an under-resourced language when it comes to language technology. Spanish sign language (LSE) is one of those under-resourced languages. Developing technology for SSL implies a number of technical challenges that must be tackled down in a structured and sequential manner. In this paper, some problems of machine-learning- based ASLR are addressed. A review of publicly available datasets is given and a new one is presented. It is also discussed the current annotations methods and annotation programs. In our review of existing datasets, our main conclusion is that there is a need for more with high-quality data and annotations.


Sign in / Sign up

Export Citation Format

Share Document