scholarly journals Too sweet: cheminformatics for deglycosylation in natural products

2020 ◽  
Author(s):  
Jonas Schaub ◽  
Achim Zielesny ◽  
Christoph Steinbeck ◽  
Maria Sorokina

Abstract Sugar units in natural products are pharmacokinetically important but often redundant and therefore obstructing the study of the structure and function of the aglycon. Therefore, it is recommended to remove the sugars before a theoretical or experimental study of a molecule. Deglycogenases, enzymes that specialized in sugar removal from small molecules, are often used in laboratories to perform this task. However, there is no standardized computational procedure to perform this task in silico. In this work, we present a systematic approach for in silico removal of ring and linear sugars from molecular structures. Particular attention is given to molecules of biological origin and to their structural specificities. This approach is made available in two forms, through a free and open web application and as standalone open-source software.

2020 ◽  
Vol 12 (1) ◽  
Author(s):  
Jonas Schaub ◽  
Achim Zielesny ◽  
Christoph Steinbeck ◽  
Maria Sorokina

Abstract Sugar units in natural products are pharmacokinetically important but often redundant and therefore obstructing the study of the structure and function of the aglycon. Therefore, it is recommended to remove the sugars before a theoretical or experimental study of a molecule. Deglycogenases, enzymes that specialized in sugar removal from small molecules, are often used in laboratories to perform this task. However, there is no standardized computational procedure to perform this task in silico. In this work, we present a systematic approach for in silico removal of ring and linear sugars from molecular structures. Particular attention is given to molecules of biological origin and to their structural specificities. This approach is made available in two forms, through a free and open web application and as standalone open-source software.


2020 ◽  
Author(s):  
Jonas Schaub ◽  
Achim Zielesny ◽  
Christoph Steinbeck ◽  
Maria Sorokina

Abstract Sugar units in natural products are pharmacokinetically important but often redundant and therefore obstructing the study of the structure and function of the aglycon. Therefore, it is recommended to remove the sugars before a theoretical or experimental study of a molecule. Deglycogenases, enzymes that specialized in sugar removal from small molecules, are often used in laboratories to perform this task. However, there is no standardized computational procedure to perform this task in silico . In this work, we present a systematic approach for in silico removal of ring and linear sugars from molecular structures. Particular attention is given to molecules of biological origin and to their structural specificities. This approach is made available in two forms, through a free and open web application and as standalone open-source software.


2020 ◽  
Author(s):  
Jonas Schaub ◽  
Achim Zielesny ◽  
Christoph Steinbeck ◽  
Maria Sorokina

Abstract Sugar units in natural products are pharmacokinetically important but often redundant and therefore obstructing the study of the structure and function of the aglycon. Therefore, it is recommended to remove the sugars before a theoretical or experimental study of a molecule. Deglycogenases, enzymes that specialized in sugar removal from small molecules, are often used in laboratories to perform this task. However, there is no standardized computational procedure to perform this task in silico . In this work, we present a systematic approach for in silico removal of ring and linear sugars from molecular structures. Particular attention is given to molecules of biological origin and to their structural specificities. This approach is made available in two forms, through a free and open web application and as standalone open-source software.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Maria Sorokina ◽  
Peter Merseburger ◽  
Kohulan Rajan ◽  
Mehmet Aziz Yirik ◽  
Christoph Steinbeck

AbstractNatural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries as many of them are bioactive. This potential raised great interest in NP research around the world and in different application fields, therefore, over the years a multiplication of generalistic and thematic NP databases has been observed. However, there is, at this moment, no online resource regrouping all known NPs in just one place, which would greatly simplify NPs research and allow computational screening and other in silico applications. In this manuscript we present the online version of the COlleCtion of Open Natural prodUcTs (COCONUT): an aggregated dataset of elucidated and predicted NPs collected from open sources and a web interface to browse, search and easily and quickly download NPs. COCONUT web is freely available at https://coconut.naturalproducts.net.


2021 ◽  
Vol 22 (7) ◽  
pp. 3714
Author(s):  
Antonino Lauria ◽  
Annamaria Martorana ◽  
Gabriele La Monica ◽  
Salvatore Mannino ◽  
Giuseppe Mannino ◽  
...  

The cell division cycle 25 (Cdc25) protein family plays a crucial role in controlling cell proliferation, making it an excellent target for cancer therapy. In this work, a set of small molecules were identified as Cdc25 modulators by applying a mixed ligand-structure-based approach and taking advantage of the correlation between the chemosensitivity of selected structures and the protein expression pattern of the proposed target. In the first step of the in silico protocol, a set of molecules acting as Cdc25 inhibitors were identified through a new ligand-based protocol and the evaluation of a large database of molecular structures. Subsequently, induced-fit docking (IFD) studies allowed us to further reduce the number of compounds biologically screened. In vitro antiproliferative and enzymatic inhibition assays on the selected compounds led to the identification of new structurally heterogeneous inhibitors of Cdc25 proteins. Among them, J3955, the most active inhibitor, showed concentration-dependent antiproliferative activity against HepG2 cells, with GI50 in the low micromolar range. When J3955 was tested in cell-cycle perturbation experiments, it caused mitotic failure by G2/M-phase cell-cycle arrest. Finally, Western blotting analysis showed an increment of phosphorylated Cdk1 levels in cells exposed to J3955, indicating its specific influence in cellular pathways involving Cdc25 proteins.


2020 ◽  
Author(s):  
Maria Sorokina ◽  
Peter Merseburger ◽  
Kohulan Rajan ◽  
Mehmet Aziz Yirik ◽  
Christoph Steinbeck

Abstract Natural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries for their high bioactivities. Over the years a multiplication of thematic NP databases has been observed. However, there is no online resource regrouping all known NPs in just one place, which would greatly simplify NP research and allow computational screening and other in silico applications. Here we present the COlleCtion of Open Natural prodUcTs (COCONUT): an aggregated dataset of NPs available in different open sources and a subsequent web interface to browse, search and easily and quickly download NPs. COCONUT web is freely available at https://coconut.naturalproducts.net.


Author(s):  
Aditya Divyakant Shrivastava ◽  
Neil Swainston ◽  
Soumitra Samanta ◽  
Ivayla Roberts ◽  
Marina Wright Muelas ◽  
...  

The ‘inverse problem’ of mass spectrometric molecular identification (‘given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came’) is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem (‘calculate a small molecule’s likely fragmentation and hence at least some of its mass spectrum from its structure alone’) is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the ‘translation’ a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the ‘true’ molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are ‘similar’ to the top hit. In addition to using the ‘top hits’ directly, we can produce a rank order of these by ‘round-tripping’ candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to ‘learn’ millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.


2021 ◽  
Author(s):  
Aditya Divyakant Shrivastava ◽  
Neil Swainston ◽  
Soumitra Samanta ◽  
Ivayla Roberts ◽  
Marina Wright Muelas ◽  
...  

The ′inverse problem′ of mass spectrometric molecular identification (′given a mass spectrum, calculate the molecule whence it came′) is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem (′calculate a small molecule′s likely fragmentation and hence at least some of its mass spectrum from its structure alone′) is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the ′translation′ a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the ′true′ molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are ′similar′ to the top hit. In addition to using the ′top hits′ directly, we can produce a rank order of these by ′round-tripping′ candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. The ability to create and to ′learn′ millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.


Biomolecules ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 1793
Author(s):  
Aditya Divyakant Shrivastava ◽  
Neil Swainston ◽  
Soumitra Samanta ◽  
Ivayla Roberts ◽  
Marina Wright Muelas ◽  
...  

The ‘inverse problem’ of mass spectrometric molecular identification (‘given a mass spectrum, calculate/predict the 2D structure of the molecule whence it came’) is largely unsolved, and is especially acute in metabolomics where many small molecules remain unidentified. This is largely because the number of experimentally available electrospray mass spectra of small molecules is quite limited. However, the forward problem (‘calculate a small molecule’s likely fragmentation and hence at least some of its mass spectrum from its structure alone’) is much more tractable, because the strengths of different chemical bonds are roughly known. This kind of molecular identification problem may be cast as a language translation problem in which the source language is a list of high-resolution mass spectral peaks and the ‘translation’ a representation (for instance in SMILES) of the molecule. It is thus suitable for attack using the deep neural networks known as transformers. We here present MassGenie, a method that uses a transformer-based deep neural network, trained on ~6 million chemical structures with augmented SMILES encoding and their paired molecular fragments as generated in silico, explicitly including the protonated molecular ion. This architecture (containing some 400 million elements) is used to predict the structure of a molecule from the various fragments that may be expected to be observed when some of its bonds are broken. Despite being given essentially no detailed nor explicit rules about molecular fragmentation methods, isotope patterns, rearrangements, neutral losses, and the like, MassGenie learns the effective properties of the mass spectral fragment and valency space, and can generate candidate molecular structures that are very close or identical to those of the ‘true’ molecules. We also use VAE-Sim, a previously published variational autoencoder, to generate candidate molecules that are ‘similar’ to the top hit. In addition to using the ‘top hits’ directly, we can produce a rank order of these by ‘round-tripping’ candidate molecules and comparing them with the true molecules, where known. As a proof of principle, we confine ourselves to positive electrospray mass spectra from molecules with a molecular mass of 500Da or lower, including those in the last CASMI challenge (for which the results are known), getting 49/93 (53%) precisely correct. The transformer method, applied here for the first time to mass spectral interpretation, works extremely effectively both for mass spectra generated in silico and on experimentally obtained mass spectra from pure compounds. It seems to act as a Las Vegas algorithm, in that it either gives the correct answer or simply states that it cannot find one. The ability to create and to ‘learn’ millions of fragmentation patterns in silico, and therefrom generate candidate structures (that do not have to be in existing libraries) directly, thus opens up entirely the field of de novo small molecule structure prediction from experimental mass spectra.


Author(s):  
Patricia G. Arscott ◽  
Gil Lee ◽  
Victor A. Bloomfield ◽  
D. Fennell Evans

STM is one of the most promising techniques available for visualizing the fine details of biomolecular structure. It has been used to map the surface topography of inorganic materials in atomic dimensions, and thus has the resolving power not only to determine the conformation of small molecules but to distinguish site-specific features within a molecule. That level of detail is of critical importance in understanding the relationship between form and function in biological systems. The size, shape, and accessibility of molecular structures can be determined much more accurately by STM than by electron microscopy since no staining, shadowing or labeling with heavy metals is required, and there is no exposure to damaging radiation by electrons. Crystallography and most other physical techniques do not give information about individual molecules.We have obtained striking images of DNA and RNA, using calf thymus DNA and two synthetic polynucleotides, poly(dG-me5dC)·poly(dG-me5dC) and poly(rA)·poly(rU).


Sign in / Sign up

Export Citation Format

Share Document