Too sweet: cheminformatics for deglycosylation in natural products

Abstract Sugar units in natural products are pharmacokinetically important but often redundant and therefore obstructing the study of the structure and function of the aglycon. Therefore, it is recommended to remove the sugars before a theoretical or experimental study of a molecule. Deglycogenases, enzymes that specialized in sugar removal from small molecules, are often used in laboratories to perform this task. However, there is no standardized computational procedure to perform this task in silico. In this work, we present a systematic approach for in silico removal of ring and linear sugars from molecular structures. Particular attention is given to molecules of biological origin and to their structural specificities. This approach is made available in two forms, through a free and open web application and as standalone open-source software.

Download Full-text

Too sweet: cheminformatics for deglycosylation in natural products

10.21203/rs.3.rs-50194/v3 ◽

2020 ◽

Author(s):

Jonas Schaub ◽

Achim Zielesny ◽

Christoph Steinbeck ◽

Maria Sorokina

Keyword(s):

Natural Products ◽

Small Molecules ◽

Open Source Software ◽

In Silico ◽

Web Application ◽

Systematic Approach ◽

Computational Procedure ◽

Molecular Structures ◽

Biological Origin ◽

And Function

Abstract Sugar units in natural products are pharmacokinetically important but often redundant and therefore obstructing the study of the structure and function of the aglycon. Therefore, it is recommended to remove the sugars before a theoretical or experimental study of a molecule. Deglycogenases, enzymes that specialized in sugar removal from small molecules, are often used in laboratories to perform this task. However, there is no standardized computational procedure to perform this task in silico . In this work, we present a systematic approach for in silico removal of ring and linear sugars from molecular structures. Particular attention is given to molecules of biological origin and to their structural specificities. This approach is made available in two forms, through a free and open web application and as standalone open-source software.

Download Full-text

Too sweet: cheminformatics for deglycosylation in natural products

10.21203/rs.3.rs-50194/v1 ◽

2020 ◽

Author(s):

Jonas Schaub ◽

Achim Zielesny ◽

Christoph Steinbeck ◽

Maria Sorokina

Keyword(s):

Natural Products ◽

Small Molecules ◽

Open Source Software ◽

In Silico ◽

Web Application ◽

Systematic Approach ◽

Computational Procedure ◽

Molecular Structures ◽

Biological Origin ◽

And Function

Abstract Sugar units in natural products are pharmacokinetically important but often redundant and therefore obstructing the study of the structure and function of the aglycon. Therefore, it is recommended to remove the sugars before a theoretical or experimental study of a molecule. Deglycogenases, enzymes that specialized in sugar removal from small molecules, are often used in laboratories to perform this task. However, there is no standardized computational procedure to perform this task in silico . In this work, we present a systematic approach for in silico removal of ring and linear sugars from molecular structures. Particular attention is given to molecules of biological origin and to their structural specificities. This approach is made available in two forms, through a free and open web application and as standalone open-source software.

Download Full-text

COCONUT online: Collection of Open Natural Products database

Journal of Cheminformatics ◽

10.1186/s13321-020-00478-9 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Maria Sorokina ◽

Peter Merseburger ◽

Kohulan Rajan ◽

Mehmet Aziz Yirik ◽

Christoph Steinbeck

Keyword(s):

Natural Products ◽

Small Molecules ◽

In Silico ◽

Web Interface ◽

Computational Screening ◽

Online Resource ◽

The World ◽

Potential Applications ◽

Living Organisms ◽

Application Fields

AbstractNatural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries as many of them are bioactive. This potential raised great interest in NP research around the world and in different application fields, therefore, over the years a multiplication of generalistic and thematic NP databases has been observed. However, there is, at this moment, no online resource regrouping all known NPs in just one place, which would greatly simplify NPs research and allow computational screening and other in silico applications. In this manuscript we present the online version of the COlleCtion of Open Natural prodUcTs (COCONUT): an aggregated dataset of elucidated and predicted NPs collected from open sources and a web interface to browse, search and easily and quickly download NPs. COCONUT web is freely available at https://coconut.naturalproducts.net.

Download Full-text

In Silico Identification of Small Molecules as New Cdc25 Inhibitors through the Correlation between Chemosensitivity and Protein Expression Pattern

International Journal of Molecular Sciences ◽

10.3390/ijms22073714 ◽

2021 ◽

Vol 22 (7) ◽

pp. 3714

Author(s):

Antonino Lauria ◽

Annamaria Martorana ◽

Gabriele La Monica ◽

Salvatore Mannino ◽

Giuseppe Mannino ◽

...

Keyword(s):

Cell Cycle ◽

Protein Expression ◽

Small Molecules ◽

Expression Pattern ◽

In Silico ◽

Molecular Structures ◽

Phase Cell ◽

Protein Expression Pattern ◽

Active Inhibitor

The cell division cycle 25 (Cdc25) protein family plays a crucial role in controlling cell proliferation, making it an excellent target for cancer therapy. In this work, a set of small molecules were identified as Cdc25 modulators by applying a mixed ligand-structure-based approach and taking advantage of the correlation between the chemosensitivity of selected structures and the protein expression pattern of the proposed target. In the first step of the in silico protocol, a set of molecules acting as Cdc25 inhibitors were identified through a new ligand-based protocol and the evaluation of a large database of molecular structures. Subsequently, induced-fit docking (IFD) studies allowed us to further reduce the number of compounds biologically screened. In vitro antiproliferative and enzymatic inhibition assays on the selected compounds led to the identification of new structurally heterogeneous inhibitors of Cdc25 proteins. Among them, J3955, the most active inhibitor, showed concentration-dependent antiproliferative activity against HepG2 cells, with GI50 in the low micromolar range. When J3955 was tested in cell-cycle perturbation experiments, it caused mitotic failure by G2/M-phase cell-cycle arrest. Finally, Western blotting analysis showed an increment of phosphorylated Cdk1 levels in cells exposed to J3955, indicating its specific influence in cellular pathways involving Cdc25 proteins.

Download Full-text

Coconut Online: Collection of Open Natural Products Database

10.21203/rs.3.rs-75600/v1 ◽

2020 ◽

Author(s):

Maria Sorokina ◽

Peter Merseburger ◽

Kohulan Rajan ◽

Mehmet Aziz Yirik ◽

Christoph Steinbeck

Keyword(s):

Natural Products ◽

Small Molecules ◽

In Silico ◽

Web Interface ◽

Computational Screening ◽

Online Resource ◽

Potential Applications ◽

Living Organisms

Abstract Natural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries for their high bioactivities. Over the years a multiplication of thematic NP databases has been observed. However, there is no online resource regrouping all known NPs in just one place, which would greatly simplify NP research and allow computational screening and other in silico applications. Here we present the COlleCtion of Open Natural prodUcTs (COCONUT): an aggregated dataset of NPs available in different open sources and a subsequent web interface to browse, search and easily and quickly download NPs. COCONUT web is freely available at https://coconut.naturalproducts.net.

Download Full-text

MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules From Their Mass Spectra

10.20944/preprints202110.0355.v1 ◽

2021 ◽

Author(s):

Aditya Divyakant Shrivastava ◽

Neil Swainston ◽

Soumitra Samanta ◽

Ivayla Roberts ◽

Marina Wright Muelas ◽

...

Keyword(s):

Mass Spectrum ◽

Small Molecules ◽

Small Molecule ◽

Molecular Identification ◽

In Silico ◽

Mass Spectra ◽

Effective Properties ◽

Molecular Structures ◽

Mass Spectral ◽

Chemical Structures

The ‘inverse problem’ of mass spectrometric molecular identification (‘given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came’) is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem (‘calculate a small molecule’s likely fragmentation and hence at least some of its mass spectrum from its structure alone’) is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the ‘translation’ a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the ‘true’ molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are ‘similar’ to the top hit. In addition to using the ‘top hits’ directly, we can produce a rank order of these by ‘round-tripping’ candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to ‘learn’ millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.

Download Full-text

MassGenie: a transformer-based deep learning method for identifying small molecules from their mass spectra

10.1101/2021.06.25.449969 ◽

2021 ◽

Author(s):

Aditya Divyakant Shrivastava ◽

Neil Swainston ◽

Soumitra Samanta ◽

Ivayla Roberts ◽

Marina Wright Muelas ◽

...

Keyword(s):

Mass Spectrum ◽

Small Molecules ◽

Small Molecule ◽

Molecular Identification ◽

In Silico ◽

Mass Spectra ◽

Effective Properties ◽

Molecular Structures ◽

Mass Spectral ◽

Chemical Structures

The ′inverse problem′ of mass spectrometric molecular identification (′given a mass spectrum, calculate the molecule whence it came′) is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem (′calculate a small molecule′s likely fragmentation and hence at least some of its mass spectrum from its structure alone′) is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the ′translation′ a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the ′true′ molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are ′similar′ to the top hit. In addition to using the ′top hits′ directly, we can produce a rank order of these by ′round-tripping′ candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. The ability to create and to ′learn′ millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.

Download Full-text

MassGenie: A Transformer-Based Deep Learning Method for Identifying Small Molecules from Their Mass Spectra

Biomolecules ◽

10.3390/biom11121793 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1793

Author(s):

Aditya Divyakant Shrivastava ◽

Neil Swainston ◽

Soumitra Samanta ◽

Ivayla Roberts ◽

Marina Wright Muelas ◽

...

Keyword(s):

Mass Spectrum ◽

Small Molecules ◽

Molecular Identification ◽

In Silico ◽

Mass Spectra ◽

De Novo ◽

Effective Properties ◽

Molecular Structures ◽

Mass Spectral ◽

Chemical Structures

The ‘inverse problem’ of mass spectrometric molecular identification (‘given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came’) is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem (‘calculate a small molecule’s likely fragmentation and hence at least some of its mass spectrum from its structure alone’) is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the ‘translation’ a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the ‘true’ molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are ‘similar’ to the top hit. In addition to using the ‘top hits’ directly, we can produce a rank order of these by ‘round-tripping’ candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to ‘learn’ millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.

Download Full-text

Scanning tunneling microscopy (STM) of DNA and RNA

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100152148 ◽

1989 ◽

Vol 47 ◽

pp. 34-35

Author(s):

Patricia G. Arscott ◽

Gil Lee ◽

Victor A. Bloomfield ◽

D. Fennell Evans

Keyword(s):

Molecular Structures ◽

Inorganic Materials ◽

Resolving Power ◽

Scanning Tunneling ◽

Tunneling Microscopy ◽

Form And Function ◽

Dna And Rna ◽

Calf Thymus ◽

And Function ◽

The Relationship

STM is one of the most promising techniques available for visualizing the fine details of biomolecular structure. It has been used to map the surface topography of inorganic materials in atomic dimensions, and thus has the resolving power not only to determine the conformation of small molecules but to distinguish site-specific features within a molecule. That level of detail is of critical importance in understanding the relationship between form and function in biological systems. The size, shape, and accessibility of molecular structures can be determined much more accurately by STM than by electron microscopy since no staining, shadowing or labeling with heavy metals is required, and there is no exposure to damaging radiation by electrons. Crystallography and most other physical techniques do not give information about individual molecules.We have obtained striking images of DNA and RNA, using calf thymus DNA and two synthetic polynucleotides, poly(dG-me5dC)·poly(dG-me5dC) and poly(rA)·poly(rU).

Download Full-text