scholarly journals A new pipeline for the normalization and pooling of metabolomics data

2021 ◽  
Author(s):  
Vivian Viallon ◽  
Mathilde His ◽  
Sabina Rinaldi ◽  
Marie Breeur ◽  
Audrey Gicquiau ◽  
...  

Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through PC-PR2 analysis; (iii) application of linear mixed models to remove unwanted variability, including samples originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.

Metabolites ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 631
Author(s):  
Vivian Viallon ◽  
Mathilde His ◽  
Sabina Rinaldi ◽  
Marie Breeur ◽  
Audrey Gicquiau ◽  
...  

Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.


2021 ◽  
Vol 99 (Supplement_1) ◽  
pp. 17-17
Author(s):  
Lexi M Ostrand ◽  
Melanie D Trenhaile-Grannemann ◽  
Garrett See ◽  
Ty B Schmidt ◽  
Eric Psota ◽  
...  

Abstract Overall activity and behavior are integral components of sows remaining productive in the herd. This investigation studied overall activity of group housed replacement gilts and the heritability of various activity traits. Beginning around 20 wk of age, video recorded data of approximately 75 gilts/group for a total of 2,378 gilts over 32 groups was collected for 7 consecutive d using the NUtrack System, which tracks distance travelled (m), avg speed (m/s), angle rotated (degrees), and time standing (s), sitting (s), eating (s), and laying (s). The recorded phenotypes were standardized to the distribution observed within a pen for each group. The final values used for analysis were the average daily standardized values. Data were analyzed using mixed models (RStudio V 1.2.5033) including effects of sire, dam, dam’s sire and dam, dam’s grandsire and granddam, farrowing group, barn, pen, and on-test date. Sire had an effect on every activity trait P < 0.001), and dam had an effect on average speed (P < 0.001). The dam’s sire had an effect on all activity traits (P < 0.001) and the dam’s grandsire had an effect on average speed (P < 0.001). Heritabilities and variance components of activity traits were estimated in ASReml 4 using an animal model with a two-generation pedigree. Genetic variances are 0.17 +/- 0.029, 0.19 +/- 0.034, and 0.11 +/- 0.024, residual variances are 0.37 +/- 0.023, 0.41 +/- 0.027, and 0.41 +/- 0.022, phenotypic variances are 0.54 +/- 0.018, 0.60 +/- 0.020, and 0.52 +/- 0.016, and heritabilities are 0.32 +/- 0.048, 0.32 +/- 0.049, and 0.21 +/- 0.044 for average speed, distance, and lie respectively. NUtrack offers potential to aid in selection decisions. Given the results presented herein, continued investigation into these activity traits and their association with sow longevity is warranted.


2020 ◽  
Vol 36 (12) ◽  
pp. 3913-3915
Author(s):  
Hemi Luan ◽  
Xingen Jiang ◽  
Fenfen Ji ◽  
Zhangzhang Lan ◽  
Zongwei Cai ◽  
...  

Abstract Motivation Liquid chromatography–mass spectrometry-based non-targeted metabolomics is routinely performed to qualitatively and quantitatively analyze a tremendous amount of metabolite signals in complex biological samples. However, false-positive peaks in the datasets are commonly detected as metabolite signals by using many popular software, resulting in non-reliable measurement. Results To reduce false-positive calling, we developed an interactive web tool, termed CPVA, for visualization and accurate annotation of the detected peaks in non-targeted metabolomics data. We used a chromatogram-centric strategy to unfold the characteristics of chromatographic peaks through visualization of peak morphology metrics, with additional functions to annotate adducts, isotopes and contaminants. CPVA is a free, user-friendly tool to help users to identify peak background noises and contaminants, resulting in decrease of false-positive or redundant peak calling, thereby improving the data quality of non-targeted metabolomics studies. Availability and implementation The CPVA is freely available at http://cpva.eastus.cloudapp.azure.com. Source code and installation instructions are available on GitHub: https://github.com/13479776/cpva. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 98 (4) ◽  
pp. 897-907
Author(s):  
Gaofeng Jia ◽  
Helen M. Booker

Multi-environment trials are conducted to evaluate the performance of cultivars. In a combined analysis, the mixed model is superior to an analysis of variance for evaluating and comparing cultivars and dealing with an unbalanced data structure. This study seeks to identify the optimal models using the Saskatchewan Variety Performance Group post-registration regional trial data for flax. Yield data were collected for 15 entries in post-registration tests conducted in Saskatchewan from 2007 to 2016 (except 2011) and 16 mixed models with homogeneous or heterogeneous residual errors were compared. A compound symmetry model with heterogeneous residual error (CSR) had the best fit, with a normal distribution of residuals and a mean of zero fitted to the trial data for each year. The compound symmetry model with homogeneous residual error (CS) and a model extending the CSR to higher dimensions (DIAGR) were the next best models in most cases. Five hundred random samples from a two-stage sampling method were produced to determine the optimal models suitable for various environments. The CSR model was superior to other models for 396 out of 500 samples (79.2%). The top three models, CSR, CS, and DIAGR, had higher statistical power and could be used to access the yield stability of the new flax cultivars. Optimal mixed models are recommended for future data analysis of new flax cultivars in regional tests.


2019 ◽  
Vol 48 (3) ◽  
pp. 978-993 ◽  
Author(s):  
Tuulia Tynkkynen ◽  
Qin Wang ◽  
Jussi Ekholm ◽  
Olga Anufrieva ◽  
Pauli Ohukainen ◽  
...  

Abstract Background Quantitative molecular data from urine are rare in epidemiology and genetics. NMR spectroscopy could provide these data in high throughput, and it has already been applied in epidemiological settings to analyse urine samples. However, quantitative protocols for large-scale applications are not available. Methods We describe in detail how to prepare urine samples and perform NMR experiments to obtain quantitative metabolic information. Semi-automated quantitative line shape fitting analyses were set up for 43 metabolites and applied to data from various analytical test samples and from 1004 individuals from a population-based epidemiological cohort. Novel analyses on how urine metabolites associate with quantitative serum NMR metabolomics data (61 metabolic measures; n = 995) were performed. In addition, confirmatory genome-wide analyses of urine metabolites were conducted (n = 578). The fully automated quantitative regression-based spectral analysis is demonstrated for creatinine and glucose (n = 4548). Results Intra-assay metabolite variations were mostly <5%, indicating high robustness and accuracy of urine NMR spectroscopy methodology per se. Intra-individual metabolite variations were large, ranging from 6% to 194%. However, population-based inter-individual metabolite variations were even larger (from 14% to 1655%), providing a sound base for epidemiological applications. Metabolic associations between urine and serum were found to be clearly weaker than those within serum and within urine, indicating that urinary metabolomics data provide independent metabolic information. Two previous genome-wide hits for formate and 2-hydroxyisobutyrate were replicated at genome-wide significance. Conclusion Quantitative urine metabolomics data suggest broad novelty for systems epidemiology. A roadmap for an open access methodology is provided.


Author(s):  
Josephine Asafu-Adjei ◽  
Mahlet G. Tadesse ◽  
Brent Coull ◽  
Raji Balasubramanian ◽  
Michael Lev ◽  
...  

AbstractMatched case-control designs are currently used in many biomedical applications. To ensure high efficiency and statistical power in identifying features that best discriminate cases from controls, it is important to account for the use of matched designs. However, in the setting of high dimensional data, few variable selection methods account for matching. Bayesian approaches to variable selection have several advantages, including the fact that such approaches visit a wider range of model subsets. In this paper, we propose a variable selection method to account for case-control matching in a Bayesian context and apply it using simulation studies, a matched brain imaging study conducted at Massachusetts General Hospital, and a matched cardiovascular biomarker study conducted by the High Risk Plaque Initiative.


2014 ◽  
Vol 30 (22) ◽  
pp. 3287-3288 ◽  
Author(s):  
Michael Nodzenski ◽  
Michael J. Muehlbauer ◽  
James R. Bain ◽  
Anna C. Reisetter ◽  
William L. Lowe ◽  
...  

Author(s):  
Renata Bujak ◽  
Emilia Daghir-Wojtkowiak ◽  
Roman Kaliszan ◽  
Michał J. Markuszewski

Sign in / Sign up

Export Citation Format

Share Document