scholarly journals High-confidence structural annotation of metabolites absent from spectral libraries

Author(s):  
Martin A. Hoffmann ◽  
Louis-Félix Nothias ◽  
Marcus Ludwig ◽  
Markus Fleischauer ◽  
Emily C. Gentry ◽  
...  

AbstractUntargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.

2021 ◽  
Author(s):  
Martin A. Hoffmann ◽  
Louis-Félix Nothias ◽  
Marcus Ludwig ◽  
Markus Fleischauer ◽  
Emily C. Gentry ◽  
...  

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. As biological interpretation relies on accurate structure annotations, the ability to assign confidence to such annotations is a key outstanding problem. We introduce the COSMIC workflow that combines structure database generation, in silico annotation, and a confidence score consisting of kernel density p-value estimation and a Support Vector Machine with enforced directionality of features. In evaluation, COSMIC annotates a substantial number of hits at small false discovery rates, and outperforms spectral library search for this purpose. To demonstrate that COSMIC can annotate structures never reported before, we annotated twelve novel bile acid conjugates; nine structures were confirmed by manual evaluation and two structures using synthetic standards. Second, we annotated and manually evaluated 315 molecular structures in human samples currently absent from the Human Metabolome Database. Third, we applied COSMIC to 17,400 experimental runs and annotated 1,715 structures with high confidence that were absent from spectral libraries.


2016 ◽  
Vol 66 ◽  
pp. 67-75 ◽  
Author(s):  
Sankar Punnaivanam ◽  
Jerome Pastal Raj Sathiadhas ◽  
Vinoth Panneerselvam

2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 266.2-267
Author(s):  
W. Han ◽  
X. Wang ◽  
L. Li ◽  
S. Wichuk ◽  
E. Hutchings ◽  
...  

Background:Early diagnosis of rheumatoid arthritis (RA) is hampered by suboptimal accuracy of currently available serological biomarkers. Metabolomics may reveal promising biomarker candidates associated with the biomolecular processes of RA. In this work, we applied a high-performance chemical isotope labeling (CIL) LC-MS technique for in-depth profiling of the amine/phenol-submetabolome in serum samples. To avoid false positives and obtain high-confidence biomarker candidates, we analyzed three independent sets of serum samples collected from RA patients and healthy controls to examine the common effects.Objectives:We aimed to identify a metabolite signature with consistently high accuracy for RA.Methods:Serum samples were taken from 3 RA cohorts, which comprised 50, 49, and 131 RA patients, respectively. Within each cohort, there were sex/age-matched healthy controls: 50 in Cohort 1, 50 in Cohort 2, and 100 in Cohort 3. Among these 446 subjects, 75% were females and the average age was 52.5 years. Amine/phenol-containing metabolites were labeled by12C-dansyl chloride to improve the LC-MS detection. For each cohort, a pooled sample was prepared and labeled by13C-dansyl group to serve as the reference sample for relative quantification. Then the individual samples and the pooled sample were mixed 1:1. Finally, an LC-QTOF-MS platform analyzed the mixtures and output the intensity ratios of12C/13C peak pairs.Results:1,149 amine/phenol-containing metabolites were commonly detected across the three sample sets. Among them, 134 were positively identified by our dansyl-labeling standard library, and 141 were matched to predicted retention times and mass values of dansyl-labeled human metabolites. Visualized by the partial least squares discriminant analysis (PLS-DA), the overall amine/phenol-submetabolome demonstrated clear and consistent differences between healthy controls and the RA groups, with cross-validation Q2 = 0.765, 0.745, 0.793, respectively. The selection of significant metabolites was conducted according to the fold change and false-discovery-rate-adjusted Welch’s t-test. Cohort 1 demonstrated 85 metabolites having higher concentrations in the RA samples than the controls, and 89 metabolites with lowered concentrations. The numbers of increased/decreased metabolites in Cohort 2 and 3 were 87/26 and 90/53, respectively. Importantly, there were 59 significantly discriminatory metabolites commonly found in the three data sets (49 increased and 9 decreased). We picked the top three with the highest univariate classification performance to form a biomarker panel. We implemented the linear support vector machine (SVM) to build the classifier and the receiver operating characteristic (ROC) analysis to measure the performance. The area-under-the-curve (AUC) values (95% confidence interval) were 1.000 (1.000-1.000), 0.992 (0.967-1.000) and 0.902 (0.858-0.945) for the three cohorts, respectively. The results revealed the importance of examining multiple sample sets and even in the worst case (Cohort 3), our biomarker candidates could differentiate RA at 82.5% sensitivity and 82.5% specificity. Particularly, in Cohort 3, there were 30 RA patients negative for anti-cyclic citrullinated peptide and rheumatoid factor, and our metabolite panel demonstrated consistently high performance for differentiating these specific subjects from healthy controls.Conclusion:Metabolites showing significant and consistent changes associated with RA have been identified with high discriminative power.Disclosure of Interests:Wei Han: None declared, Xiaohang Wang: None declared, Liang Li: None declared, Stephanie Wichuk: None declared, Edna Hutchings: None declared, Rana Dadashova: None declared, Joel Paschke: None declared, Walter P Maksymowych Grant/research support from: Received research and/or educational grants from Abbvie, Novartis, Pfizer, UCB, Consultant of: WPM is Chief Medical Officer of CARE Arthritis Limited, has received consultant/participated in advisory boards for Abbvie, Boehringer, Celgene, Eli-Lilly, Galapagos, Gilead, Janssen, Novartis, Pfizer, UCB, Speakers bureau: Received speaker fees from Abbvie, Janssen, Novartis, Pfizer, UCB.


2021 ◽  
Author(s):  
M. Dendy Darma ◽  
M. Reza Faisal ◽  
Irwan Budiman ◽  
Rudy Herteno ◽  
Juliyatin Putri Utami ◽  
...  

2014 ◽  
Vol 42 (11) ◽  
pp. 1811-1819 ◽  
Author(s):  
Kouta Toshimoto ◽  
Naomi Wakayama ◽  
Makiko Kusama ◽  
Kazuya Maeda ◽  
Yuichi Sugiyama ◽  
...  

Molecules ◽  
2017 ◽  
Vol 22 (11) ◽  
pp. 1891 ◽  
Author(s):  
Xiaowei Zhao ◽  
Xiaosa Zhao ◽  
Lingling Bao ◽  
Yonggang Zhang ◽  
Jiangyan Dai ◽  
...  

2018 ◽  
Vol 7 (2.24) ◽  
pp. 428
Author(s):  
Rishi Khosla ◽  
Yashovardhan Singh ◽  
T Balachander

Mobile Technologies have been in trend for quite some time and with the advances in machine learning, they have become more powerful. Computer Vision, Computational Analysis and Computer Graphics have changed over the course of time. In this Project, our aim is to figure out the domains in which Machine Learning can be applied to enhance the capabilities of a Mobile Device which would lead to a better and sustainable mobile user experience.  The models we would use are a convolutional neural network (CNN), support vector machine (SVM) and scale-invariant feature transform (SIFT). This project uses the real-time image from a mobile device and does the classification and detection with the help of Tensor Flow and provides the result with a confidence score. 


2020 ◽  
Vol 9 (1) ◽  
pp. 840-848

Caffeic acid diversities are widely considered as one of the most pharmaceutical secondary metabolites to study for treating a wide range of disorders and diseases. In this paper, toxicity, ADME, and pharmaceutical activity of 16 compounds of the Caffeic acid diversities, are analyzed by Toxtree software and Molinspiration website, respectively. According to the results, it can be concluded that Caffeoylmalic acid and Dactylifric acid could be considered as the safest and the most applicable compounds. It might be suggested that modification of molecular structures of Chlorogenic acid and Neochlorogenic acid could be useful for becoming low toxic and more applicable compounds for oral consumption.


2009 ◽  
Vol 6 (2) ◽  
pp. 165-190 ◽  
Author(s):  
Mou'ath Hourani ◽  
Emary El

Gene expression data often contain missing expression values. For the purpose of conducting an effective clustering analysis and since many algorithms for gene expression data analysis require a complete matrix of gene array values, choosing the most effective missing value estimation method is necessary. In this paper, the most commonly used imputation methods from literature are critically reviewed and analyzed to explain the proper use, weakness and point the observations on each published method. From the conducted analysis, we conclude that the Local Least Square (LLS) and Support Vector Regression (SVR) algorithms have achieved the best performances. SVR can be considered as a complement algorithm for LLS especially when applied to noisy data. However, both algorithms suffer from some deficiencies presented in choosing the value of Number of Selected Genes (K) and the appropriate kernel function. To overcome these drawbacks, the need for new method that automatically chooses the parameters of the function and it also has an appropriate computational complexity is imperative.


Sign in / Sign up

Export Citation Format

Share Document