chemical space
Recently Published Documents





2022 ◽  
Martyna Cybularczyk-Cecotka ◽  
Jędrzej Predygier ◽  
Stefano Crespi ◽  
Joanna Szczepanik ◽  
Maciej Giedyk

Micellar photocatalysis has recently opened new avenues to activate strong carbon halide bonds. So far, however, it has mainly explored strongly reducing conditions restricting the available chemical space to radical or anionic reactivity. Here, we demonstrate a radical-polar crossover process involving cationic intermediates, which enables chemodivergent modification of chlorinated benzamide derivatives via either C H arylation or N dealkylation. The catalytic system operates under mild conditions employing methylene blue as a photocatalyst and blue LEDs as the light source. Factors determining the reactivity of substrates and preliminary mechanistic studies are presented.

2022 ◽  
Vol 14 (1) ◽  
Alan Kerstjens ◽  
Hans De Winter

AbstractGiven an objective function that predicts key properties of a molecule, goal-directed de novo molecular design is a useful tool to identify molecules that maximize or minimize said objective function. Nonetheless, a common drawback of these methods is that they tend to design synthetically unfeasible molecules. In this paper we describe a Lamarckian evolutionary algorithm for de novo drug design (LEADD). LEADD attempts to strike a balance between optimization power, synthetic accessibility of designed molecules and computational efficiency. To increase the likelihood of designing synthetically accessible molecules, LEADD represents molecules as graphs of molecular fragments, and limits the bonds that can be formed between them through knowledge-based pairwise atom type compatibility rules. A reference library of drug-like molecules is used to extract fragments, fragment preferences and compatibility rules. A novel set of genetic operators that enforce these rules in a computationally efficient manner is presented. To sample chemical space more efficiently we also explore a Lamarckian evolutionary mechanism that adapts the reproductive behavior of molecules. LEADD has been compared to both standard virtual screening and a comparable evolutionary algorithm using a standardized benchmark suite and was shown to be able to identify fitter molecules more efficiently. Moreover, the designed molecules are predicted to be easier to synthesize than those designed by other evolutionary algorithms. Graphical Abstract

2022 ◽  
Jan Řezáč

The Non-Covalent Interactions Atlas ( has been extended with two data sets of benchmark interaction energies in complexes dominated by London dispersion. The D1200 data set of equilibrium geometries provides a thorough sampling of an extended chemical space, while the D442×10 set features dissociation curves for selected complexes. In total, they provide 5,178 new CCSD(T)/CBS data points of the highest quality. The new data have been combined with previous NCIA data sets in a comprehensive test of dispersion-corrected DFT methods, identifying the ones that achieve high accuracy in all types of non-covalent interactions in a broad chemical space. Additional tests of dispersion-corrected MP2 and semiempirical QM methods are also reported.

2022 ◽  
Mingjian Wen ◽  
Samuel M. Blau ◽  
Xiaowei Xie ◽  
Shyam Dwaraknath ◽  
Kristin A. Persson

Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem—classifying reactions into distinct families—and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.

2022 ◽  
Shomik Verma ◽  
Miguel Rivera ◽  
David O. Scanlon ◽  
Aron Walsh

Understanding the excited state properties of molecules provides insights into how they interact with light. These interactions can be exploited to design compounds for photochemical applications, including enhanced spectral conversion of light to increase the efficiency of photovoltaic cells. While chemical discovery is time- and resource-intensive experimentally, computational chemistry can be used to screen large-scale databases for molecules of interest in a procedure known as high-throughput virtual screening. The first step usually involves a high-speed but low-accuracy method to screen large numbers of molecules (potentially millions) so only the best candidates are evaluated with expensive methods. However, use of a coarse first-pass screening method can potentially result in high false positive or false negative rates. Therefore, this study uses machine learning to calibrate a high-throughput technique (xTB-sTDA) against a higher accuracy one (TD-DFT). Testing the calibration model shows a ~5-fold decrease in error in-domain and a ~3-fold decrease out-of-domain. The resulting mean absolute error of ~0.14 eV is in line with previous work in machine learning calibrations and out-performs previous work in linear calibration of xTB-sTDA. We then apply the calibration model to screen a 250k molecule database and map inaccuracies of xTB-sTDA in chemical space. We also show generalizability of the workflow by calibrating against a higher-level technique (CC2), yielding a similarly low error. Overall, this work demonstrates machine learning can be used to develop a both cheap and accurate method for large-scale excited state screening, enabling accelerated molecular discovery across a variety of disciplines.

Marine Drugs ◽  
2022 ◽  
Vol 20 (1) ◽  
pp. 58
José X. Soares ◽  
Daniela R. P. Loureiro ◽  
Ana Laura Dias ◽  
Salete Reis ◽  
Madalena M. M. Pinto ◽  

The marine environment is an important source of specialized metabolites with valuable biological activities. Xanthones are a relevant chemical class of specialized metabolites found in this environment due to their structural variety and their biological activities. In this work, a comprehensive literature review of marine xanthones reported up to now was performed. A large number of bioactive xanthone derivatives (169) were identified, and their structures, biological activities, and natural sources were described. To characterize the chemical space occupied by marine-derived xanthones, molecular descriptors were calculated. For the analysis of the molecular descriptors, the xanthone derivatives were grouped into five structural categories (simple, prenylated, O-heterocyclic, complex, and hydroxanthones) and six biological activities (antitumor, antibacterial, antidiabetic, antifungal, antiviral, and miscellaneous). Moreover, the natural product-likeness and the drug-likeness of marine xanthones were also assessed. Marine xanthone derivatives are rewarding bioactive compounds and constitute a promising starting point for the design of other novel bioactive molecules.

2022 ◽  
tao zeng ◽  
B. Andes Hess ◽  
fan zhang ◽  
ruibo wu

Many computational methods are used to expand the open-ended border of chemical spaces. Natural products and their derivatives are an important source for drug discovery, and some algorithms are devoted to rapidly generating pseudo-natural products, while their accessibility and chemical interpretation were often ignored or underestimated, thus hampering experimental synthesis in practice. Herein, a bio-inspired strategy (named TeroGen) is proposed, in which the cyclization and decoration stage of terpenoid biosynthesis were mimicked by meta-dynamics simulations and deep learning models respectively, to explore their chemical space. In the protocol of TeroGen, the synthetic accessibility is validated by reaction energetics (reaction barrier and reaction heat) based on the GFN2-xTB methods. Chemical interpretation is an intrinsic feature as the reaction pathway is bioinspired and triggered by the RMSD-PP method in conjunction with an encoder-decoder architecture. This is quite distinct from conventional library/fragment-based or rule-based strategies, by using TeroGen, new reaction routes are feasibly explored to increase the structural diversity. For example, only a rather limited number of sesterterpenoids in our training set is included in this work, but our TeroGen would predict more than 30000 sesterterpenoids and map out the reaction network with super efficiency, ten times as many as the known sesterterpenoids (less than 2500). In sum, TeroGen not only greatly expands the chemical space of terpenoids but also provides various plausible biosynthetic pathways, which are crucial clues for heterologous biosynthesis, bio-mimic and chemical synthesis of complicated terpenoids.

2022 ◽  
Srijit Seal ◽  
Jordi Carreras-Puigvert ◽  
Maria-Anna Trapotsi ◽  
Hongbin Yang ◽  
Ola Spjuth ◽  

Mitochondrial toxicity is an important safety endpoint in drug discovery. Models based solely on chemical structure for predicting mitochondrial toxicity are currently limited in accuracy and applicability domain to the chemical space of the training compounds. In this work, we aimed to utilize both -omics and chemical data to push beyond the state-of-the-art. We combined Cell Painting and Gene Expression data with chemical structural information from Morgan fingerprints for 382 chemical perturbants tested in the Tox21 mitochondrial membrane depolarization assay. We observed that mitochondrial toxicants significantly differ from non-toxic compounds in morphological space and identified compound clusters having similar mechanisms of mitochondrial toxicity, thereby indicating that morphological space provides biological insights related to mechanisms of action of this endpoint. We further showed that models combining Cell Painting, Gene Expression features and Morgan fingerprints improved model performance on an external test set of 236 compounds by 60% (in terms of F1 score) and improved extrapolation to new chemical space. The performance of our combined models was comparable with dedicated in vitro assays for mitochondrial toxicity; and they were able to detect mitochondrial toxicity where Tox21 assays outcomes were inconclusive because of cytotoxicity. Our results suggest that combining chemical descriptors with different levels of biological readouts enhances the detection of mitochondrial toxicants, with practical implications for use in drug discovery.

Sign in / Sign up

Export Citation Format

Share Document