Chemical language models enable navigation in sparsely populated chemical space

Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.

Download Full-text

Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

10.26434/chemrxiv.14153408 ◽

2021 ◽

Author(s):

Michael Moret ◽

Moritz Helmstädter ◽

Francesca Grisoni ◽

Gisbert Schneider ◽

Daniel Merk

Keyword(s):

De Novo ◽

Molecular Design ◽

Search Algorithm ◽

Sampling Technique ◽

Machine Intelligence ◽

Language Models ◽

Scoring Functions ◽

Beam Search ◽

De Novo Drug Design ◽

Chemical Language

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>

Download Full-text

Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

10.26434/chemrxiv.14153408.v1 ◽

2021 ◽

Author(s):

Michael Moret ◽

Moritz Helmstädter ◽

Francesca Grisoni ◽

Gisbert Schneider ◽

Daniel Merk

Keyword(s):

De Novo ◽

Molecular Design ◽

Search Algorithm ◽

Sampling Technique ◽

Machine Intelligence ◽

Language Models ◽

Scoring Functions ◽

Beam Search ◽

De Novo Drug Design ◽

Chemical Language

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>

Download Full-text

Leveraging molecular structure and bioactivity with chemical language models for drug design

10.33774/chemrxiv-2021-xzgst ◽

2021 ◽

Author(s):

Michael Moret ◽

Francesca Grisoni ◽

Cyrill Brunner ◽

Gisbert Schneider

Keyword(s):

Molecular Structure ◽

De Novo ◽

Molecular Design ◽

Structural Information ◽

Molecular Structures ◽

Language Models ◽

Structure Generation ◽

Compound Screening ◽

Phosphoinositide 3 Kinase ◽

Chemical Language

Generative chemical language models (CLMs) can be used for de novo molecular structure generation. These CLMs learn from the structural information of known molecules to generate new ones. In this paper, we show that “hybrid” CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), we created a large collection of virtual molecules with a generative CLM. This primary virtual compound library was further refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ binders and non-binders by transfer learning. Several of the computer-generated molecular designs were commercially available, which allowed for fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design in low-data situations.

Download Full-text

Statistical Language Models for Information Retrieval A Critical Review

10.1561/9781601981875 ◽

2007 ◽

Cited By ~ 4

Author(s):

ChengXiang Zhai

Keyword(s):

Information Retrieval ◽

Critical Review ◽

Language Models ◽

Statistical Language Models

Download Full-text

Total Synthesis of Axially-Chiral Cannabinols: A New Platform for Cannabinoid-Based Drug Discovery

10.26434/chemrxiv.10251035.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Primali Navaratne ◽

Jenny Wilkerson ◽

Kavindri Ranasinghe ◽

Evgeniya Semenova ◽

Lance McMahon ◽

...

Keyword(s):

Drug Discovery ◽

Total Synthesis ◽

Ground State ◽

Chemical Space ◽

Modern Medicine ◽

The Novel ◽

Pharmaceutical Development ◽

Bioactive Component ◽

Axially Chiral ◽

New Directions

<div> <div> <div> <p>Phytocannabinoids, molecules isolated from cannabis, are gaining attention as promising leads in modern medicine, including pain management. Considering the urgent need for combating the opioid crisis, new directions for the design of cannabinoid-inspired analgesics are of immediate interest. In this regard, we have hypothesized that axially-chiral-cannabinols (ax-CBNs), unnatural (and unknown) isomers of cannabinol (CBN) may be valuable scaffolds for cannabinoid-inspired drug discovery. There are multiple reasons for thinking this: (a) ax-CBNs would have ground-state three-dimensionality akin to THC, a key bioactive component of cannabis, (b) ax-CBNs at their core structure are biaryl molecules, generally attractive platforms for pharmaceutical development due to their ease of functionalization and stability, and (c) atropisomerism with respect to phytocannabinoids is unexplored “chemical space.” Herein we report a scalable total synthesis of ax-CBNs, examine physical properties experimentally and computationally, and provide preliminary behavioral and analgesic analysis of the novel scaffolds. </p> </div> </div> </div>

Download Full-text

Metal-, and Organocatalyst-Free One-Pot Assembly of Chiral Aza-Tricyclic Molecules: Creating Six Contiguous Stereocenters from 2-D-Structures and an Amino Acid

10.26434/chemrxiv.12757943.v1 ◽

2020 ◽

Author(s):

Dung Do

Keyword(s):

Amino Acid ◽

Chemical Space ◽

Structural Diversity ◽

Therapeutic Agents ◽

Chiral Molecules ◽

One Pot ◽

Complex Molecules ◽

Chiral Catalyst ◽

Chiral Complex ◽

Free Process

<p>Chiral molecules with their defined 3-D structures are of paramount importance for the study of chemical biology and drug discovery. Having rich structural diversity and unique stereoisomerism, chiral molecules offer a large chemical space that can be explored for the design of new therapeutic agents.<sup>1</sup> Practically, chiral architectures are usually prepared from organometallic and organocatalytic processes where a transition metal or an organocatalyst is tailor-made for desired reactions. As a result, developing a method that enables rapid assembly of chiral complex molecules under metal- and organocatalyst-free condition represents a daunting challenge. Here we developed a straightforward route to create a chiral 3-D structure from 2-D structures and an amino acid without any chiral catalyst. The center of this research is the design of a <a>special chiral spiroimidazolidinone cyclohexadienone intermediate</a>, a merger of a chiral reactive substrate with multiple nucleophillic/electrophillic sites and a transient organocatalyst. <a>This unique substrate-catalyst (“subcatalyst”) dual role of the intermediate enhances </a><a>the coordinational proximity of the chiral substrate and catalyst</a> in the key Aza-Michael/Michael cascade resulting in a substantial steric discrimination and an excellent overall diastereoselectivity. Whereas the “subcatalyst” (hidden catalyst) is not present in the reaction’s initial components, which renders a chiral catalyst-free process, it is strategically produced to promote sequential self-catalyzed reactions. The success of this methodology will pave the way for many efficient preparations of chiral complex molecules and aid for the quest to create next generation of therapeutic agents.</p>

Download Full-text

Reaction-based Enumeration, Active Learning, and Free Energy Calculations to Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin Dependent Kinase 2 Inhibitors

10.26434/chemrxiv.7841270.v2 ◽

2019 ◽

Author(s):

Kyle Konze ◽

Pieter Bos ◽

Markus Dahlgren ◽

Karl Leswing ◽

Ivan Tubert-Brohman ◽

...

Keyword(s):

Free Energy ◽

Drug Discovery ◽

Active Learning ◽

Large Scale ◽

Chemical Space ◽

Population Based ◽

Free Energy Calculations ◽

Computational Technique ◽

Cyclin Dependent Kinase ◽

Energy Calculations

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC<sub>50</sub> < 100 nM, and four unique cores with a predicted IC<sub>50</sub> < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Download Full-text

Reaction-based Enumeration, Active Learning, and Free Energy Calculations to Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin Dependent Kinase 2 Inhibitors

10.26434/chemrxiv.7841270 ◽

2019 ◽

Author(s):

Kyle Konze ◽

Pieter Bos ◽

Markus Dahlgren ◽

Karl Leswing ◽

Ivan Tubert-Brohman ◽

...

Keyword(s):

Free Energy ◽

Drug Discovery ◽

Active Learning ◽

Large Scale ◽

Chemical Space ◽

Population Based ◽

Free Energy Calculations ◽

Computational Technique ◽

Cyclin Dependent Kinase ◽

Energy Calculations

We report a new computational technique, PathFinder, that uses retrosynthetic analysis followed by combinatorial synthesis to generate novel compounds in synthetically accessible chemical space. Coupling PathFinder with active learning and cloud-based free energy calculations allows for large-scale potency predictions of compounds on a timescale that impacts drug discovery. The process is further accelerated by using a combination of population-based statistics and active learning techniques. Using this approach, we rapidly optimized R-groups and core hops for inhibitors of cyclin-dependent kinase 2. We explored greater than 300 thousand ideas and identified 35 ligands with diverse commercially available R-groups and a predicted IC<sub>50</sub> < 100 nM, and four unique cores with a predicted IC<sub>50</sub> < 100 nM. The rapid turnaround time, and scale of chemical exploration, suggests that this is a useful approach to accelerate the discovery of novel chemical matter in drug discovery campaigns.

Download Full-text

General Cyclopropane Assembly via Enantioselective Redox-Active Carbene Transfer to Aliphatic Olefins

10.26434/chemrxiv.7436795 ◽

2018 ◽

Author(s):

Marc Montesinos-Magraner ◽

Matteo Costantini ◽

Rodrigo Ramirez-Contreras ◽

Michael E. Muratore ◽

Magnus J. Johansson ◽

...

Keyword(s):

Total Synthesis ◽

Asymmetric Synthesis ◽

Chemical Space ◽

Synthetic Approach ◽

Asymmetric Cyclopropanation ◽

Redox Active ◽

Wide Range ◽

Leaving Group ◽

Stereoelectronic Properties ◽

Carbene Transfer

Asymmetric cyclopropane synthesis currently requires bespoke strategies, methods, substrates and reagents, even when targeting similar compounds. This limits the speed and chemical space available for discovery campaigns. Here we introduce a practical and versatile diazocompound, and we demonstrate its performance in the first unified asymmetric synthesis of functionalized cyclopropanes. We found that the redox-active leaving group in this reagent enhances the reactivity and selectivity of geminal carbene transfer. This effect enabled the asymmetric cyclopropanation of a wide range of olefins including unactivated aliphatic alkenes, enabling the 3-step total synthesis of (–)-dictyopterene A. This unified synthetic approach delivers high enantioselectivities that are independent of the stereoelectronic properties of the functional groups transferred. Our results demonstrate that orthogonally-differentiated diazocompounds are viable and advantageous equivalents of single-carbon chirons<i>.</i>

Download Full-text