scholarly journals RegioSQM20: improved prediction of the regioselectivity of electrophilic aromatic substitutions

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Nicolai Ree ◽  
Andreas H. Göller ◽  
Jan H. Jensen

AbstractWe present RegioSQM20, a new version of RegioSQM (Chem Sci 9:660, 2018), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) reactions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program is used instead of the closed source program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7 to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (React Chem Eng 5:896, 2020) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) developed by Schwaller et al. (ACS Cent Sci 5:1572, 2019). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tautomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3–85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future.

2020 ◽  
Author(s):  
Nicolai Ree ◽  
Andreas Gõller ◽  
Jan H. Jensen

<div> <div> <div> <p>We present RegioSQM20, a new version of RegioSQM (<i>Chem. Sci</i>. 2018, 9, 660), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) re- actions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program xtb is used instead of the closed source MOPAC program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7% to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (<i>React. Chem. Eng</i>. 2020, 5, 896) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) de- veloped by Schwaller et al. (<i>ACS Cent. Sci</i>. 2019, 5, 1572). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tau- tomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3%-85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future. </p> </div> </div> </div>


2020 ◽  
Author(s):  
Nicolai Ree ◽  
Andreas Gõller ◽  
Jan H. Jensen

<div> <div> <div> <p>We present RegioSQM20, a new version of RegioSQM (<i>Chem. Sci</i>. 2018, 9, 660), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) re- actions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program xtb is used instead of the closed source MOPAC program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7% to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (<i>React. Chem. Eng</i>. 2020, 5, 896) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) de- veloped by Schwaller et al. (<i>ACS Cent. Sci</i>. 2019, 5, 1572). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tau- tomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3%-85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Nicolai Ree ◽  
Andreas H. Göller ◽  
Jan H. Jensen

We present RegioML, an atom-based machine learning model for predicting the regioselectivities of electrophilic aromatic substitution reactions. The model relies on CM5 atomic charges computed using semiempirical tight binding (GFN1-xTB) combined with the ensemble decision tree variant light gradient boosting machine (LightGBM). The model is trained and tested on 21,201 bromination reactions with 101K reaction centers, which is split into a training, test, and out-of-sample datasets with 58K, 15K, and 27K reaction centers, respectively. The accuracy is 93% for the test set and 90% for the out-of-sample set, while the precision (the percentage of positive predictions that are correct) is 88% and 80%, respectively. The test-set performance is very similar to the graph-based WLN method developed by Struble et al. (React. Chem. Eng. 2020, 5, 896) though the comparison is complicated by the possibility that some of the test and out-of-sample molecules are used to train WLN. RegioML out-performs our physics-based RegioSQM20 method (J. Cheminform. 2021, 13:10) where the precision is only 75%. Even for the out-of-sample dataset, RegioML slightly outperforms RegioSQM20. The good performance of RegioML and WLN is in large part due to the large datasets available for this type of reaction. However, for reactions where there is little experimental data, physics-based approaches like RegioSQM20 can be used to generate synthetic data for model training. We demonstrate this by showing that the performance of RegioSQM20 can be reproduced by a ML-model trained on RegioSQM20-generated data.


Author(s):  
Ricardo Oliveira ◽  
Rafael Moreno

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.


Author(s):  
Ricardo Oliveira ◽  
Rafael Moreno

Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.


mSystems ◽  
2020 ◽  
Vol 5 (4) ◽  
Author(s):  
Robert A. Petit ◽  
Timothy D. Read

ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.


2021 ◽  
Author(s):  
SHOGO ARAI ◽  
Kazuya Konada ◽  
Naoya Yoshinaga ◽  
Akinari Kobayashi ◽  
Kazuhiro Kosuge

<div>This study proposes a method of robust regrasping an object using a dual-arm robot with general-purpose hands, which is robust against the error of grasping. In this paper, one arm is assigned to hand over the object to the other arm that is named a receiver arm. The grasping error must be considered to increase the success rate of the regrasping since a hand-over arm first picks up the object with the general-purpose hand. In an online phase, the proposed method performs object positioning at an optimal pose at the time of regrasping using an image-based visual servoing (IBVS) approach to reduce the effect of the grasping error. In the planning phase, the proposed method computes the optimal pose for regrasping by maximizing the minimum singular values of the image Jacobian of IBVS to achieve a high positioning accuracy using a 3D model of the target object. To achieve the regrasping objects with various shapes robustly against image noises and changes in light environments, the image Jacobian of IBVS is computed by numerical differential using an actual data set. A large number of data sets corresponding to each candidate grasp are usually required for computing the image Jacobian. To reduce the number of data sets, we propose a conversion method of the image Jacobian requiring only one data set corresponding to one representative grasp. The experimental results show that the proposed method achieves regrasping of target objects with the general-purpose hands with high success rates and performs target object positioning with less than 0.7[mm] positioning error.</div>


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 167
Author(s):  
Ivan Kholod ◽  
Evgeny Yanaki ◽  
Dmitry Fomichev ◽  
Evgeniy Shalugin ◽  
Evgenia Novikova ◽  
...  

The rapid development of Internet of Things (IoT) systems has led to the problem of managing and analyzing the large volumes of data that they generate. Traditional approaches that involve collection of data from IoT devices into one centralized repository for further analysis are not always applicable due to the large amount of collected data, the use of communication channels with limited bandwidth, security and privacy requirements, etc. Federated learning (FL) is an emerging approach that allows one to analyze data directly on data sources and to federate the results of each analysis to yield a result as traditional centralized data processing. FL is being actively developed, and currently, there are several open-source frameworks that implement it. This article presents a comparative review and analysis of the existing open-source FL frameworks, including their applicability in IoT systems. The authors evaluated the following features of the frameworks: ease of use and deployment, development, analysis capabilities, accuracy, and performance. Three different data sets were used in the experiments—two signal data sets of different volumes and one image data set. To model low-power IoT devices, computing nodes with small resources were defined in the testbed. The research results revealed FL frameworks that could be applied in the IoT systems now, but with certain restrictions on their use.


2018 ◽  
Author(s):  
Sean Wilner ◽  
Katherine Wood ◽  
Daniel J. Simons

Raw data are often unavailable, and all that may remain of a data set are its summary statistics. When these data are integers on a fixed scale, such as Likert-style ratings, and their mean, standard deviation, and sample size are known, it is possible to reconstruct every raw distribution that gives rise to those summary statistics using a system of Diophantine equations. We have developed the open-source program CORVIDS (COmplete Reconstruction of Values In Diophantine Systems) to deterministically reconstruct raw data from summary statistics using this technique. The solutions generated by the program are provably complete. Here we describe the implementation, provide examples and use cases, and prove the correctness of the underlying mathematics. CORVIDS is open-source and available as source code or as stand-alone, user-friendly applications for macOS and Windows.


2011 ◽  
Vol 44 (6) ◽  
pp. 1182-1189 ◽  
Author(s):  
Jarosław A. Kalinowski ◽  
Anna Makal ◽  
Philip Coppens

A new method for determination of the orientation matrix of Laue X-ray data is presented. The method is based on matching of the experimental patterns of central reciprocal lattice rows projected on a unit sphere centered on the origin of the reciprocal lattice with the corresponding pattern of a monochromatic data set on the same material. This technique is applied to the complete data set and thus eliminates problems often encountered when single frames with a limited number of peaks are to be used for orientation matrix determination. Application of the method to a series of Laue data sets on organometallic crystals is described. The corresponding program is available under a Mozilla Public License-like open-source license.


Sign in / Sign up

Export Citation Format

Share Document