chemical language Latest Research Papers

Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.

Download Full-text

Leveraging molecular structure and bioactivity with chemical language models for drug design

10.33774/chemrxiv-2021-xzgst ◽

2021 ◽

Author(s):

Michael Moret ◽

Francesca Grisoni ◽

Cyrill Brunner ◽

Gisbert Schneider

Keyword(s):

Molecular Structure ◽

De Novo ◽

Molecular Design ◽

Structural Information ◽

Molecular Structures ◽

Language Models ◽

Structure Generation ◽

Compound Screening ◽

Phosphoinositide 3 Kinase ◽

Chemical Language

Generative chemical language models (CLMs) can be used for de novo molecular structure generation. These CLMs learn from the structural information of known molecules to generate new ones. In this paper, we show that “hybrid” CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), we created a large collection of virtual molecules with a generative CLM. This primary virtual compound library was further refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ binders and non-binders by transfer learning. Several of the computer-generated molecular designs were commercially available, which allowed for fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design in low-data situations.

Download Full-text

Much more than a toxin: hydrogen cyanide is a volatile modulator of bacterial behaviour

10.1101/2021.09.29.462390 ◽

2021 ◽

Author(s):

Abhishek Anand ◽

Laurent Falquet ◽

Eliane Abou-Mansour ◽

Floriane L'Haridon ◽

Christoph Keel ◽

...

Keyword(s):

Secondary Metabolites ◽

Cytochrome C Oxidase ◽

Hydrogen Cyanide ◽

Biofilm Formation ◽

Biotic Interactions ◽

Cellular Functions ◽

Bacterial Interactions ◽

Signalling Molecule ◽

Global Transcriptome ◽

Chemical Language

Bacteria communicate with each other and with other organisms in a chemical language comprising both diffusible and volatile molecules, and volatiles have recently gained increasing interest as mediators of bacterial interactions. One of the first volatile compounds discovered to play a role in biotic interactions is hydrogen cyanide (HCN), a well-known toxin, which irreversibly binds to the key respiratory enzyme cytochrome C oxidase. The main ecological function of this molecule was so far thought to lie in the inhibition of competing microorganisms. Here we show that HCN is much more than a respiratory toxin and should be considered a major regulator of bacterial behaviour rather than a solely defensive secondary metabolite. Cyanogenesis occurs in both environmental and clinical Pseudomonas strains. Using cyanide-deficient mutants in two Pseudomonas strains, we demonstrate that HCN functions as an intracellular and extracellular volatile signalling molecule, which leads to global transcriptome reprogramming affecting growth, motility, and biofilm formation, as well as the production of other secondary metabolites such as siderophores and phenazines. Our data suggest that bacteria are not only using endogenous HCN to control their own cellular functions, but are also able to remotely influence the behaviour of other bacteria sharing the same environment.

Download Full-text

Chemical language models enable navigation in sparsely populated chemical space

Nature Machine Intelligence ◽

10.1038/s42256-021-00368-1 ◽

2021 ◽

Author(s):

Michael A. Skinnider ◽

R. Greg Stacey ◽

David S. Wishart ◽

Leonard J. Foster

Keyword(s):

Chemical Space ◽

Language Models ◽

Chemical Language

Download Full-text

PySMILESUtils – Enabling deep learning with the SMILES chemical language

10.33774/chemrxiv-2021-kzhbs ◽

2021 ◽

Author(s):

Esben Bjerrum ◽

Tobias Rastemo ◽

Ross Irwin ◽

Christos Kannas ◽

Samuel Genheden

Keyword(s):

Deep Learning ◽

Chemical Language

Download Full-text

Predicting Polymers’ Glass Transition Temperature by a Chemical Language Processing Model

Polymers ◽

10.3390/polym13111898 ◽

2021 ◽

Vol 13 (11) ◽

pp. 1898

Author(s):

Guang Chen ◽

Lei Tao ◽

Ying Li

Keyword(s):

Glass Transition ◽

Transition Temperature ◽

Glass Transition Temperature ◽

Language Processing ◽

High Throughput Screening ◽

Molecular Descriptors ◽

Feature Representation ◽

Structure Property ◽

Repeat Units ◽

Chemical Language

We propose a chemical language processing model to predict polymers’ glass transition temperature (Tg) through a polymer language (SMILES, Simplified Molecular Input Line Entry System) embedding and recurrent neural network. This model only receives the SMILES strings of a polymer’s repeat units as inputs and considers the SMILES strings as sequential data at the character level. Using this method, there is no need to calculate any additional molecular descriptors or fingerprints of polymers, and thereby, being very computationally efficient. More importantly, it avoids the difficulties to generate molecular descriptors for repeat units containing polymerization point `*’. Results show that the trained model demonstrates reasonable prediction performance on unseen polymer’s Tg. Besides, this model is further applied for high-throughput screening on an unlabeled polymer database to identify high-temperature polymers that are desired for applications in extreme environments. Our work demonstrates that the SMILES strings of polymer repeat units can be used as an effective feature representation to develop a chemical language processing model for predictions of polymer Tg. The framework of this model is general and can be used to construct structure–property relationships for other polymer properties.

Download Full-text

Predicting Polymer’s Glass Transition Temperature by A Chemical Language Processing Model

10.20944/preprints202105.0655.v1 ◽

2021 ◽

Author(s):

Guang Chen ◽

Lei Tao ◽

Ying Li

Keyword(s):

Glass Transition ◽

Transition Temperature ◽

Glass Transition Temperature ◽

Language Processing ◽

High Throughput Screening ◽

Molecular Descriptors ◽

Feature Representation ◽

Structure Property ◽

Repeat Units ◽

Chemical Language

We propose a chemical language processing model to predict polymers’ glass transition temperature (Tg) through a polymer language (SMILES, Simplified Molecular Input Line Entry System) embedding and recurrent neural network. This model only receives the SMILES strings of polymer’s repeat units as inputs and considers the SMILES strings as sequential data at the character level. Using this method, there is no need to calculate any additional molecular descriptors or fingerprints of polymers, and thereby, being very computationally efficient and simple. More importantly, it avoids the difficulties to generate molecular descriptors for repeat units containing polymerization point `*’. Results show that the trained model demonstrates reasonable prediction accuracy on unseen polymer’s Tg. Besides, this model is further applied for high-throughput screening on an unlabeled polymer database to identify high-temperature polymers that are desired for applications in extreme environments. Our work demonstrates that the SMILES strings of polymer’s repeat units can be used as an effective feature representation to develop a chemical language processing model for predictions of Tg. The framework of this model is general and can be used to construct structure-property relationships for other polymer’s properties.

Download Full-text

Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

10.26434/chemrxiv.14153408 ◽

2021 ◽

Author(s):

Michael Moret ◽

Moritz Helmstädter ◽

Francesca Grisoni ◽

Gisbert Schneider ◽

Daniel Merk

Keyword(s):

De Novo ◽

Molecular Design ◽

Search Algorithm ◽

Sampling Technique ◽

Machine Intelligence ◽

Language Models ◽

Scoring Functions ◽

Beam Search ◽

De Novo Drug Design ◽

Chemical Language

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>

Download Full-text

Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

10.26434/chemrxiv.14153408.v1 ◽

2021 ◽

Author(s):

Michael Moret ◽

Moritz Helmstädter ◽

Francesca Grisoni ◽

Gisbert Schneider ◽

Daniel Merk

Keyword(s):

De Novo ◽

Molecular Design ◽

Search Algorithm ◽

Sampling Technique ◽

Machine Intelligence ◽

Language Models ◽

Scoring Functions ◽

Beam Search ◽

De Novo Drug Design ◽

Chemical Language

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>

Download Full-text

chemical language
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Generative Chemical Transformer: Neural Machine Learning of Molecular Geometric Structures from Chemical Language via Attention

Perplexity-based molecule ranking and bias estimation of chemical language models

Leveraging molecular structure and bioactivity with chemical language models for drug design

Much more than a toxin: hydrogen cyanide is a volatile modulator of bacterial behaviour

Chemical language models enable navigation in sparsely populated chemical space

PySMILESUtils – Enabling deep learning with the SMILES chemical language

Predicting Polymers’ Glass Transition Temperature by a Chemical Language Processing Model

Predicting Polymer’s Glass Transition Temperature by A Chemical Language Processing Model

Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

Export Citation Format

chemical languageRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Generative Chemical Transformer: Neural Machine Learning of Molecular Geometric Structures from Chemical Language via Attention

Perplexity-based molecule ranking and bias estimation of chemical language models

Leveraging molecular structure and bioactivity with chemical language models for drug design

Much more than a toxin: hydrogen cyanide is a volatile modulator of bacterial behaviour

Chemical language models enable navigation in sparsely populated chemical space

PySMILESUtils – Enabling deep learning with the SMILES chemical language

Predicting Polymers’ Glass Transition Temperature by a Chemical Language Processing Model

Predicting Polymer’s Glass Transition Temperature by A Chemical Language Processing Model

Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

Beam Search Sampling for Molecular Design and Intrinsic Prioritization with Machine Intelligence

chemical language
Recently Published Documents