chemical language
Recently Published Documents


TOTAL DOCUMENTS

70
(FIVE YEARS 18)

H-INDEX

13
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Michael Moret ◽  
Francesca Grisoni ◽  
Paul Katzberger ◽  
Gisbert Schneider

Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.


2021 ◽  
Author(s):  
Michael Moret ◽  
Francesca Grisoni ◽  
Cyrill Brunner ◽  
Gisbert Schneider

Generative chemical language models (CLMs) can be used for de novo molecular structure generation. These CLMs learn from the structural information of known molecules to generate new ones. In this paper, we show that “hybrid” CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), we created a large collection of virtual molecules with a generative CLM. This primary virtual compound library was further refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ binders and non-binders by transfer learning. Several of the computer-generated molecular designs were commercially available, which allowed for fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design in low-data situations.


2021 ◽  
Author(s):  
Abhishek Anand ◽  
Laurent Falquet ◽  
Eliane Abou-Mansour ◽  
Floriane L'Haridon ◽  
Christoph Keel ◽  
...  

Bacteria communicate with each other and with other organisms in a chemical language comprising both diffusible and volatile molecules, and volatiles have recently gained increasing interest as mediators of bacterial interactions. One of the first volatile compounds discovered to play a role in biotic interactions is hydrogen cyanide (HCN), a well-known toxin, which irreversibly binds to the key respiratory enzyme cytochrome C oxidase. The main ecological function of this molecule was so far thought to lie in the inhibition of competing microorganisms. Here we show that HCN is much more than a respiratory toxin and should be considered a major regulator of bacterial behaviour rather than a solely defensive secondary metabolite. Cyanogenesis occurs in both environmental and clinical Pseudomonas strains. Using cyanide-deficient mutants in two Pseudomonas strains, we demonstrate that HCN functions as an intracellular and extracellular volatile signalling molecule, which leads to global transcriptome reprogramming affecting growth, motility, and biofilm formation, as well as the production of other secondary metabolites such as siderophores and phenazines. Our data suggest that bacteria are not only using endogenous HCN to control their own cellular functions, but are also able to remotely influence the behaviour of other bacteria sharing the same environment.


Author(s):  
Michael A. Skinnider ◽  
R. Greg Stacey ◽  
David S. Wishart ◽  
Leonard J. Foster

2021 ◽  
Author(s):  
Esben Bjerrum ◽  
Tobias Rastemo ◽  
Ross Irwin ◽  
Christos Kannas ◽  
Samuel Genheden

Polymers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 1898
Author(s):  
Guang Chen ◽  
Lei Tao ◽  
Ying Li

We propose a chemical language processing model to predict polymers’ glass transition temperature (Tg) through a polymer language (SMILES, Simplified Molecular Input Line Entry System) embedding and recurrent neural network. This model only receives the SMILES strings of a polymer’s repeat units as inputs and considers the SMILES strings as sequential data at the character level. Using this method, there is no need to calculate any additional molecular descriptors or fingerprints of polymers, and thereby, being very computationally efficient. More importantly, it avoids the difficulties to generate molecular descriptors for repeat units containing polymerization point `*’. Results show that the trained model demonstrates reasonable prediction performance on unseen polymer’s Tg. Besides, this model is further applied for high-throughput screening on an unlabeled polymer database to identify high-temperature polymers that are desired for applications in extreme environments. Our work demonstrates that the SMILES strings of polymer repeat units can be used as an effective feature representation to develop a chemical language processing model for predictions of polymer Tg. The framework of this model is general and can be used to construct structure–property relationships for other polymer properties.


Author(s):  
Guang Chen ◽  
Lei Tao ◽  
Ying Li

We propose a chemical language processing model to predict polymers’ glass transition temperature (Tg) through a polymer language (SMILES, Simplified Molecular Input Line Entry System) embedding and recurrent neural network. This model only receives the SMILES strings of polymer’s repeat units as inputs and considers the SMILES strings as sequential data at the character level. Using this method, there is no need to calculate any additional molecular descriptors or fingerprints of polymers, and thereby, being very computationally efficient and simple. More importantly, it avoids the difficulties to generate molecular descriptors for repeat units containing polymerization point `*’. Results show that the trained model demonstrates reasonable prediction accuracy on unseen polymer’s Tg. Besides, this model is further applied for high-throughput screening on an unlabeled polymer database to identify high-temperature polymers that are desired for applications in extreme environments. Our work demonstrates that the SMILES strings of polymer’s repeat units can be used as an effective feature representation to develop a chemical language processing model for predictions of Tg. The framework of this model is general and can be used to construct structure-property relationships for other polymer’s properties.


2021 ◽  
Author(s):  
Michael Moret ◽  
Moritz Helmstädter ◽  
Francesca Grisoni ◽  
Gisbert Schneider ◽  
Daniel Merk

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>


2021 ◽  
Author(s):  
Michael Moret ◽  
Moritz Helmstädter ◽  
Francesca Grisoni ◽  
Gisbert Schneider ◽  
Daniel Merk

Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORg. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.<br>


Sign in / Sign up

Export Citation Format

Share Document