fragmentation tree
Recently Published Documents


TOTAL DOCUMENTS

10
(FIVE YEARS 4)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Eleni Litsa ◽  
Vijil Chenthamarakshan ◽  
Payel Das ◽  
Lydia Kavraki

Elucidating the structure of a chemical compound is a fundamental task in chemistry with application in multiple domains including the emerging field of metabolomics, with promising applications in drug discovery, precision medicine, and biomarker discovery. The common practice for elucidating the structure of a chemical compound is to obtain a mass spectrum and subsequently retrieve its structure from spectral databases. However, database retrieval methods fail to identify novel molecules that are not present in the reference database. In this work, we propose Spec2Mol, a deep learning architecture for molecular structure recommendation given mass spectra alone. Spec2Mol is inspired by the Speech2Text deep learning architectures for translating audio signals into text. Our approach is based on an encoder-decoder architecture. The encoder learns the spectra embeddings, while the decoder, pre-trained on a massive dataset of chemical structures for translating between different molecular representations, reconstructs SMILES sequences of the recommended chemical structures. We have evaluated Spec2Mol by assessing the molecular similarity between the recommended structures and the original structure. Our analysis showed that Spec2Mol is able to identify the presence of key substructures in the molecule from its mass spectrum, and shows on par performance, when compared to existing fragmentation tree based methods, in recommending molecules for a given mass spectrum.


2021 ◽  
Author(s):  
Myriam Guillevic ◽  
Martin K. Vollmer ◽  
Matthias Hill ◽  
Paul Schlauri ◽  
Aurore Guillevic ◽  
...  

<p>Non-target screening consists in searching for all present substances in a sample, suspected or unknown, with very little prior knowledge about the sample. This approach has been introduced more than a decade ago in the field of water analysis or forensics, but is still very scarce in the field of indoor and atmospheric trace gas measurements, despite the urgent need for a better understanding of the composition of the atmosphere.</p><p>Recently, we have installed a novel analytical system at the Jungfraujoch high alpine station (3500 m.a.s.l., Switzerland), allowing us to conduct non-target screening of the atmosphere. The system is composed of a preconcentration unit followed by gas chromatography (GC), electron ionisation (EI), and time-of-flight high-resolution mass spectrometry (HRMS). This allows screening the air for all mass fragments from approx. 25 m/z up to 300 m/z, produced by compounds with boiling points from -128 °C (NF<sub>3</sub>, CF<sub>4</sub>) to +140 °C (e.g., CHBr<sub>3</sub>, chlorobenzene, parachlorobenzotrifluoride PCBTF).</p><p>Here, we present a new and innovative method to detect and identify unknown organic substances in ambient air using GC-EI-HRMS. We developed an algorithm combining the identification of atom assemblage for the detected fragments and the reconstruction of a pseudo-fragmentation tree, linking fragments belonging to the same substance. This supports in particular the identification of substances for which no mass spectrum is registered in databases. Moreover, we developed a quality control strategy to ensure that the compounds have been correctly identified and are separated from potential coelutants.</p><p>Finally, we present a selection of halogenated compounds newly detected in air, measured for the first time at the Jungfraujoch station.</p>


2019 ◽  
Author(s):  
Marcus Ludwig ◽  
Louis-Félix Nothias ◽  
Kai Dührkop ◽  
Irina Koester ◽  
Markus Fleischauer ◽  
...  

1AbstractThe confident high-throughput identification of small molecules remains one of the most challenging tasks in mass spectrometry-based metabolomics. SIRIUS has become a powerful tool for the interpretation of tandem mass spectra, and shows outstanding performance for identifying the molecular formula of a query compound, being the first step of structure identification. Nevertheless, the identification of both molecular formulas for large compounds above 500 Daltons and novel molecular formulas remains highly challenging. Here, we present ZODIAC, a network-based algorithm for the de novo estimation of molecular formulas. ZODIAC reranks SIRIUS’ molecular formula candidates, combining fragmentation tree computation with Bayesian statistics using Gibbs sampling. Through careful algorithm engineering, ZODIAC’s Gibbs sampling is very swift in practice. ZODIAC decreases incorrect annotations 16.2-fold on a challenging plant extract dataset with most compounds above 700 Dalton; we then show improvements on four additional, diverse datasets. Our analysis led to the discovery of compounds with novel molecular formulas such as C24H47BrNO8P which, as of today, is not present in any publicly available molecular structure databases.


2019 ◽  
Vol 61 (5-6) ◽  
pp. 285-292
Author(s):  
Kai Dührkop

Abstract Identification of small molecules remains a central question in analytical chemistry, in particular for natural product research, metabolomics, environmental research, and biomarker discovery. Mass spectrometry is the predominant technique for high-throughput analysis of small molecules. But it reveals only information about the mass of molecules and, by using tandem mass spectrometry, about the mass of molecular fragments. Automated interpretation of mass spectra is often limited to searching in spectral libraries, such that we can only dereplicate molecules for which we have already recorded reference mass spectra. In my thesis “Computational methods for small molecule identification” we developed SIRIUS, a tool for the structural elucidation of small molecules with tandem mass spectrometry. The method first computes a hypothetical fragmentation tree using combinatorial optimization. By using a Bayesian statistical model, we can learn parameters and hyperparameters of the underlying scoring directly from data. We demonstrate that the statistical model, which was fitted on a small dataset, generalizes well across many different datasets and mass spectrometry instruments. In a second step the fragmentation tree is used to predict a molecular fingerprint using kernel support vector machines. The predicted fingerprint can be searched in a structure database to identify the molecular structure. We demonstrate that our machine learning model outperforms all other methods for this task, including its predecessor FingerID. SIRIUS is available as commandline tool and as user interface. The molecular fingerprint prediction is implemented as web service and receives over one million requests per month.


2018 ◽  
Vol 20 (6) ◽  
pp. 2028-2043 ◽  
Author(s):  
Dai Hai Nguyen ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka

Abstract Motivation: Metabolomics involves studies of a great number of metabolites, which are small molecules present in biological systems. They play a lot of important functions such as energy transport, signaling, building block of cells and inhibition/catalysis. Understanding biochemical characteristics of the metabolites is an essential and significant part of metabolomics to enlarge the knowledge of biological systems. It is also the key to the development of many applications and areas such as biotechnology, biomedicine or pharmaceuticals. However, the identification of the metabolites remains a challenging task in metabolomics with a huge number of potentially interesting but unknown metabolites. The standard method for identifying metabolites is based on the mass spectrometry (MS) preceded by a separation technique. Over many decades, many techniques with different approaches have been proposed for MS-based metabolite identification task, which can be divided into the following four groups: mass spectra database, in silico fragmentation, fragmentation tree and machine learning. In this review paper, we thoroughly survey currently available tools for metabolite identification with the focus on in silico fragmentation, and machine learning-based approaches. We also give an intensive discussion on advanced machine learning methods, which can lead to further improvement on this task.


2015 ◽  
Vol 112 (41) ◽  
pp. 12580-12585 ◽  
Author(s):  
Kai Dührkop ◽  
Huibin Shen ◽  
Marvin Meusel ◽  
Juho Rousu ◽  
Sebastian Böcker

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin.


Oecologia ◽  
2014 ◽  
Vol 176 (1) ◽  
pp. 207-224 ◽  
Author(s):  
John O. Stireman ◽  
Hilary Devlin ◽  
Annie L. Doyle

2012 ◽  
Vol 36 (1) ◽  
pp. 66-72 ◽  
Author(s):  
Jerzy Bańbura ◽  
Mirosława Bańbura

Abstract The Great Tit Parus major and the Blue Tit Cyanistes caeruleus are the only Western Palearctic Parids that maintain numerous urban populations as well as forest populations. Because of their evolutionary history both these species are best adapted to different types of deciduous and mixed forests. Ecological conditions in cities are different from those dominating in forests, especially in such aspects as: habitat fragmentation, tree species composition, microclimate, human activity, predators and food conditions. The tits breeding in cities start laying eggs earlier in the season, lay smaller clutches and fledge fewer fledglings of lower quality. Yet urban populations are often relatively stable in numbers. This may result from the fact that survival of winter is higher in cities due to increased availability of food and milder weather.


2012 ◽  
Vol 419 (3-4) ◽  
pp. 211-222 ◽  
Author(s):  
Gerrit G. Langer ◽  
Guillaume X. Evrard ◽  
Ciaran G. Carolan ◽  
Victor S. Lamzin

Sign in / Sign up

Export Citation Format

Share Document