scholarly journals Reproducible Untargeted Metabolomics Data Analysis Workflow for Exhaustive MS/MS Annotation

Author(s):  
Miao Yu ◽  
Georgia Dolios ◽  
Lauren Petrick

<p>Unknown features in untargeted metabolomics and non-targeted analysis (NTA) are identified using fragment ions from MS/MS spectra to predict the structures of the unknown compounds. The precursor ion selected for fragmentation is commonly performed using data dependent acquisition (DDA) strategies or following statistical analysis using targeted MS/MS approaches. However, the selected precursor ions from DDA only cover a biased subset of the peaks or features found in full scan data. In addition, different statistical analysis can select different precursor ions for MS/MS analysis, which make the <i>post-hoc</i> validation of ions selected by new statistical methods impossible for precursor ions selected by the original statistical method. Here we propose an automated, exhaustive, statistical model-free workflow: paired mass distance-dependent analysis (PMDDA), for untargeted mass spectrometry identification of unknown compounds. By removing redundant peaks and performing pseudo-targeted MS/MS analysis on independent peaks, we can comprehensively cover unknown compounds found in full scan analysis using a “one peak for one compound” workflow without a priori redundant peak information. We show that compared to DDA, PMDDA is more comprehensive and robust against samples' matrix effects. Further, more compounds were identified by database annotation using PMDDA compared with CAMERA and RAMClustR. Finally, compounds with signals in both positive and negative modes can be identified by the PMDDA workflow, to further reduce redundancies. The whole workflow is fully reproducible as a docker image xcmsrocker with both the original data and the data processing template. </p>

2021 ◽  
Author(s):  
Miao Yu ◽  
Georgia Dolios ◽  
Lauren Petrick

<p>Unknown features in untargeted metabolomics and non-targeted analysis (NTA) are identified using fragment ions from MS/MS spectra to predict the structures of the unknown compounds. The precursor ion selected for fragmentation is commonly performed using data dependent acquisition (DDA) strategies or following statistical analysis using targeted MS/MS approaches. However, the selected precursor ions from DDA only cover a biased subset of the peaks or features found in full scan data. In addition, different statistical analysis can select different precursor ions for MS/MS analysis, which make the <i>post-hoc</i> validation of ions selected by new statistical methods impossible for precursor ions selected by the original statistical method. Here we propose an automated, exhaustive, statistical model-free workflow: paired mass distance-dependent analysis (PMDDA), for untargeted mass spectrometry identification of unknown compounds. By removing redundant peaks and performing pseudo-targeted MS/MS analysis on independent peaks, we can comprehensively cover unknown compounds found in full scan analysis using a “one peak for one compound” workflow without a priori redundant peak information. We show that compared to DDA, PMDDA is more comprehensive and robust against samples' matrix effects. Further, more compounds were identified by database annotation using PMDDA compared with CAMERA and RAMClustR. Finally, compounds with signals in both positive and negative modes can be identified by the PMDDA workflow, to further reduce redundancies. The whole workflow is fully reproducible as a docker image xcmsrocker with both the original data and the data processing template. </p>


Metabolites ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 8
Author(s):  
Michiel Bongaerts ◽  
Ramon Bonte ◽  
Serwet Demirdas ◽  
Edwin H. Jacobs ◽  
Esmee Oussoren ◽  
...  

Untargeted metabolomics is an emerging technology in the laboratory diagnosis of inborn errors of metabolism (IEM). Analysis of a large number of reference samples is crucial for correcting variations in metabolite concentrations that result from factors, such as diet, age, and gender in order to judge whether metabolite levels are abnormal. However, a large number of reference samples requires the use of out-of-batch samples, which is hampered by the semi-quantitative nature of untargeted metabolomics data, i.e., technical variations between batches. Methods to merge and accurately normalize data from multiple batches are urgently needed. Based on six metrics, we compared the existing normalization methods on their ability to reduce the batch effects from nine independently processed batches. Many of those showed marginal performances, which motivated us to develop Metchalizer, a normalization method that uses 10 stable isotope-labeled internal standards and a mixed effect model. In addition, we propose a regression model with age and sex as covariates fitted on reference samples that were obtained from all nine batches. Metchalizer applied on log-transformed data showed the most promising performance on batch effect removal, as well as in the detection of 195 known biomarkers across 49 IEM patient samples and performed at least similar to an approach utilizing 15 within-batch reference samples. Furthermore, our regression model indicates that 6.5–37% of the considered features showed significant age-dependent variations. Our comprehensive comparison of normalization methods showed that our Log-Metchalizer approach enables the use out-of-batch reference samples to establish clinically-relevant reference values for metabolite concentrations. These findings open the possibilities to use large scale out-of-batch reference samples in a clinical setting, increasing the throughput and detection accuracy.


2014 ◽  
Vol 240 (4) ◽  
pp. 488-497 ◽  
Author(s):  
Nianbai Fang ◽  
Shanggong Yu ◽  
Martin JJ Ronis ◽  
Thomas M Badger

2014 ◽  
Vol 29 (5) ◽  
pp. 903 ◽  
Author(s):  
Jitka Míková ◽  
Jan Košler ◽  
Michael Wiedenbeck

2021 ◽  
Author(s):  
Li Chen ◽  
Wenyun Lu ◽  
Lin Wang ◽  
Xi Xing ◽  
Xin Teng ◽  
...  

AbstractA primary goal of metabolomics is to identify all biologically important metabolites. One powerful approach is liquid chromatography-high resolution mass spectrometry (LC-MS), yet most LC-MS peaks remain unidentified. Here, we present a global network optimization approach, NetID, to annotate untargeted LC-MS metabolomics data. We consider all experimentally observed ion peaks together, and assign annotations to all of them simultaneously so as to maximize a score that considers properties of peaks (known masses, retention times, MS/MS fragmentation patterns) as well network constraints that arise based on mass difference between peaks. Global optimization results in accurate peak assignment and trackable peak-peak relationships. Applying this approach to yeast and mouse data, we identify a half-dozen novel metabolites, including thiamine and taurine derivatives. Isotope tracer studies indicate active flux through these metabolites. Thus, NetID applies existing metabolomic knowledge and global optimization to annotate untargeted metabolomics data, revealing novel metabolites.


2003 ◽  
Vol 53 ◽  
Author(s):  
Lilian Chavez-Kus ◽  
Eduardo Salamuni

A área abrange o município de Curitiba, Paraná, posicionado sobre rochas da Formação Guabirotuba, que por sua vez está sotoposta ao Complexo Atuba, formado por granitóides deformados, gnaisses e anfibolitos, afetados por fraturas que permitem a circulação de água subterrânea e se constitui em um aqüífero fraturado. O objetivo da pesquisa é reconhecer e comparar dados hidrogeológicos relativos ao aqüífero por análise estatística convencional, visando identificar a distribuição geográfica e as características dos poços tubulares profundos. Foram complementados e atualizados bancos de dados hidrogeológicos previamente existentes, reunindo-se informações de 1.297 poços perfurados entre os anos de 1950 a 2001. Grande parte das sondagens foi georefenciada em campo. A análise estatística resultou na familiarização dos dados e a detecção dos padrões de irregularidades existentes, além da determinação de tendências e agrupamentos. Constatou-se que nos últimos 10 anos houve acréscimo na realização de novas sondagens, caracterizando aumento da demanda pela utilização de água subterrânea para os diversos usos. No centro e nos bairros circunvizinhos ocorre utilização mais pronunciada da água subterrânea, seguida pelos bairros industriais e aqueles com concentração de serviços. A profundidade média dos poços tubulares é de 112 m, podendo chegar até a 390 m. As variáveis “profundidade” e “entrada d’água” são correlacionadas até os 220 m de profundidade, que se constitui a profundidade máxima verificada. O horizonte no qual a água subterrânea circula não se limita a apenas um único nível, devido às estruturas geradas pela tectônica rúptil. Embora a média geral da “vazão” seja de 3,6 m 3 /h, ocorrem casos de até 44 m 3 /h. Os valores extremos de vazão situam-se em locais onde as sondagens chegaram a estruturas, ou intersecção de estruturas, francamente favoráveis à circulação da água subterrânea. Também foi caracterizada a presença de heterogeneidade nas variá-veis hidrogeológicas locais. Os resultados obtidos permitem que, em análise futura, os dados sejam homogeneizados com maior facilidade através de análise estrutural e geoestatística das variáveis do aqüífero fissural. O trabalho permite a espacialização dos dados hidrogeológicos de acordo com o uso do solo e a subdivisão político-administrativa do município. STATISTICAL ANALYSIS OF HYDROGEOLOGICAL DATA OF TUBULAR WELLS CURITIBA’S MUNICIPALITY-PARANÁ Extended Abstract The studied area is made up solely of the municipality district of Curitiba (PR), which is almost totally positioned on the rocks of the Guaratuba Formation, in Curitiba Basin. It is placed beneath the Atuba Complex, formed by deformed granithoids, gneiss and amphibolites, affected by an intricate net of generally open fractures that give away an effective circulation of groundwater. The main objective of this research is to identify and to compare the hydrogeological data referring to the fractured aquifer of Atuba Complex, through a conventional statistical analysis. The use of the statistics sought to identify the geographic distribution and characteristics of the variables relative to the deep tubular wells of the studied area (figures 1). In the research, hydrogeological databases, examined previously by Salamuni (1981), Nogueira Filho (1997) and Salamuni (1998), were complemented, updated and reorganized. Therefore, information from 1297 tubular wells was gathered, with bore holes comprising the period from 1950 to 2001 (figure 4). Most of the bore holes were georeferenced in the field, with the aid of a GPS. Through the statistical analysis, the familiarization of the data occurred and irregularity patterns were able to be detected. Moreover, through the identification of the structure of the original data, the presence of tendencies and groupings were also detected (figures 5 and 6). It was verified that, in the last 10 years, there was a vertiginous growth in the demand for new bore holes (figure 12 and 13), demonstrating a sharp demand for the use of groundwater as a supply in the provisioning of the city of Curitiba. The main use of the groundwater occurs in the city center and surrounding neighborhoods, followed by the industrial neighborhoods where there is a notable concentration of services (figure 6). The medium depth of the tubular wells in the municipal is of 112 m, and in extreme cases it reaches 390 m (figures 10 and 11). The variable “depth” and “entrance of water” are positively correlated until 220 m, starting from the surface of the land (figures 17 until 21). Below this horizon there is no information that permits speculation. The horizon in which the groundwater circulates is not only limited to a single level, as a function of the response to a structural conditioning generated by a tectonic ruptile. Although the general average of the variable drainage is of 3.6 m 3 /h, exceptional cases are verified of up to 44 m 3 /h (figures 7 until 9). This heterogeneity of the data shows that the variability of the phenomenon in the area is quite big. The extreme drainage values represent the places where the surveys reached structural alignments or where favorable structures intersected the circulation of groundwater. Through the statistical analysis of the data, the presence of a heterogeneous space pattern was confirmed in the hydrogeological variables of the area. That is a first step in the treatment of this information, which will, in the future, allow the data to be homogenized more easily through a structural and geostatistical analysis of the fractured aquifer variables. The study presents a good dimension of the spacialization of the hydrogological data according to the use of the soil and the political-administrative subdivision of the municipal district.


Sign in / Sign up

Export Citation Format

Share Document