scholarly journals PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data

Author(s):  
Renata Bujak ◽  
Emilia Daghir-Wojtkowiak ◽  
Roman Kaliszan ◽  
Michał J. Markuszewski
2020 ◽  
Vol 36 (12) ◽  
pp. 3913-3915
Author(s):  
Hemi Luan ◽  
Xingen Jiang ◽  
Fenfen Ji ◽  
Zhangzhang Lan ◽  
Zongwei Cai ◽  
...  

Abstract Motivation Liquid chromatography–mass spectrometry-based non-targeted metabolomics is routinely performed to qualitatively and quantitatively analyze a tremendous amount of metabolite signals in complex biological samples. However, false-positive peaks in the datasets are commonly detected as metabolite signals by using many popular software, resulting in non-reliable measurement. Results To reduce false-positive calling, we developed an interactive web tool, termed CPVA, for visualization and accurate annotation of the detected peaks in non-targeted metabolomics data. We used a chromatogram-centric strategy to unfold the characteristics of chromatographic peaks through visualization of peak morphology metrics, with additional functions to annotate adducts, isotopes and contaminants. CPVA is a free, user-friendly tool to help users to identify peak background noises and contaminants, resulting in decrease of false-positive or redundant peak calling, thereby improving the data quality of non-targeted metabolomics studies. Availability and implementation The CPVA is freely available at http://cpva.eastus.cloudapp.azure.com. Source code and installation instructions are available on GitHub: https://github.com/13479776/cpva. Supplementary information Supplementary data are available at Bioinformatics online.


Plant Disease ◽  
2022 ◽  
Author(s):  
Francisco Beluzán ◽  
Xavier Miarnau ◽  
Laura Torguet ◽  
Lourdes Zazurca ◽  
Paloma Abad-Campos ◽  
...  

Twenty-five almond cultivars were assessed for susceptibility to Diaporthe amygdali, causal agent of twig canker and shoot blight disease. In laboratory experiments, growing twigs were inoculated with four D. amygdali isolates. Moreover, growing shoots of almond cultivars grafted onto INRA ‘GF-677’ rootstock were used in four-year field inoculations with one D. amygdali isolate. In both type of experiments, inoculum consisted of agar plugs with mycelium, which were inserted underneath the bark and the lesion lengths caused by the fungus were measured. Necrotic lesions were observed in the inoculated almond cultivars both in laboratory and field tests, confirming the susceptibility of all the evaluated cultivars to all the inoculated isolates of D. amygdali. Cultivars were grouped as susceptible or very susceptible according to a cluster analysis. The relationship between some agronomic traits and cultivar susceptibility was also investigated. Blooming and ripening times were found relevant variables to explain cultivars performance related to D. amygdali susceptibility. Late and very late blooming, and early and medium ripening cultivars were highly susceptible to D. amygdali. Our results may provide valuable information that could assist in ongoing breeding programs of this crop and additionally in the selection of cultivars for new almond plantations.


2018 ◽  
Vol 34 (7) ◽  
Author(s):  
Manuel Lozano ◽  
Lara Manyes ◽  
Juanjo Peiró ◽  
Adina Iftimi ◽  
José María Ramada

Multidisciplinary research in public health is approached using methods from many scientific disciplines. One of the main characteristics of this type of research is dealing with large data sets. Classic statistical variable selection methods, known as “screen and clean”, and used in a single-step, select the variables with greater explanatory weight in the model. These methods, commonly used in public health research, may induce masking and multicollinearity, excluding relevant variables for the experts in each discipline and skewing the result. Some specific techniques are used to solve this problem, such as penalized regressions and Bayesian statistics, they offer more balanced results among subsets of variables, but with less restrictive selection thresholds. Using a combination of classical methods, a three-step procedure is proposed in this manuscript, capturing the relevant variables of each scientific discipline, minimizing the selection of variables in each of them and obtaining a balanced distribution that explains most of the variability. This procedure was applied on a dataset from a public health research. Comparing the results with the single-step methods, the proposed method shows a greater reduction in the number of variables, as well as a balanced distribution among the scientific disciplines associated with the response variable. We propose an innovative procedure for variable selection and apply it to our dataset. Furthermore, we compare the new method with the classic single-step procedures.


2014 ◽  
Vol 30 (22) ◽  
pp. 3287-3288 ◽  
Author(s):  
Michael Nodzenski ◽  
Michael J. Muehlbauer ◽  
James R. Bain ◽  
Anna C. Reisetter ◽  
William L. Lowe ◽  
...  

Metabolomics ◽  
2014 ◽  
Vol 11 (3) ◽  
pp. 764-777 ◽  
Author(s):  
Alexander Kaever ◽  
Manuel Landesfeind ◽  
Kirstin Feussner ◽  
Alina Mosblech ◽  
Ingo Heilmann ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yuan Zhou ◽  
Botao Fa ◽  
Ting Wei ◽  
Jianle Sun ◽  
Zhangsheng Yu ◽  
...  

AbstractInvestigation of the genetic basis of traits or clinical outcomes heavily relies on identifying relevant variables in molecular data. However, characteristics such as high dimensionality and complex correlation structures of these data hinder the development of related methods, resulting in the inclusion of false positives and negatives. We developed a variable importance measure method, termed the ECAR scores, that evaluates the importance of variables in the dataset. Based on this score, ranking and selection of variables can be achieved simultaneously. Unlike most current approaches, the ECAR scores aim to rank the influential variables as high as possible while maintaining the grouping property, instead of selecting the ones that are merely predictive. The ECAR scores’ performance is tested and compared to other methods on simulated, semi-synthetic, and real datasets. Results showed that the ECAR scores improve the CAR scores in terms of accuracy of variable selection and high-rank variables’ predictive power. It also outperforms other classic methods such as lasso and stability selection when there is a high degree of correlation among influential variables. As an application, we used the ECAR scores to analyze genes associated with forced expiratory volume in the first second in patients with lung cancer and reported six associated genes.


2021 ◽  
Vol 9 ◽  
Author(s):  
Kayla A. Carter ◽  
Christopher D. Simpson ◽  
Daniel Raftery ◽  
Marissa G. Baker

Objectives: Despite the widespread use of manganese (Mn) in industrial settings and its association with adverse neurological outcomes, a validated and reliable biomarker for Mn exposure is still elusive. Here, we utilize targeted metabolomics to investigate metabolic differences between Mn-exposed and -unexposed workers, which could inform a putative biomarker for Mn and lead to increased understanding of Mn toxicity.Methods: End of shift spot urine samples collected from Mn exposed (n = 17) and unexposed (n = 15) workers underwent a targeted assay of 362 metabolites using LC-MS/MS; 224 were quantified and retained for analysis. Differences in metabolite abundances between exposed and unexposed workers were tested with a Benjamini-Hochberg adjusted Wilcoxon Rank-Sum test. We explored perturbed pathways related to exposure using a pathway analysis.Results: Seven metabolites were significantly differentially abundant between exposed and unexposed workers (FDR ≤ 0.1), including n-isobutyrylglycine, cholic acid, anserine, beta-alanine, methionine, n-isovalerylglycine, and threonine. Three pathways were significantly perturbed in exposed workers and had an impact score >0.5: beta-alanine metabolism, histidine metabolism, and glycine, serine, and threonine metabolism.Conclusion: This is one of few studies utilizing targeted metabolomics to explore differences between Mn-exposed and -unexposed workers. Metabolite and pathway analysis showed amino acid metabolism was perturbed in these Mn-exposed workers. Amino acids have also been shown to be perturbed in other occupational cohorts exposed to Mn. Additional research is needed to characterize the biological importance of amino acids in the Mn exposure-disease continuum, and to determine how to appropriately utilize and interpret metabolomics data collected from occupational cohorts.


2021 ◽  
Author(s):  
Vivian Viallon ◽  
Mathilde His ◽  
Sabina Rinaldi ◽  
Marie Breeur ◽  
Audrey Gicquiau ◽  
...  

Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through PC-PR2 analysis; (iii) application of linear mixed models to remove unwanted variability, including samples originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.


Molecules ◽  
2019 ◽  
Vol 24 (20) ◽  
pp. 3695 ◽  
Author(s):  
Tao Wang ◽  
Qingjun Zou ◽  
Qiaosheng Guo ◽  
Feng Yang ◽  
Liwei Wu ◽  
...  

Chrysanthemum morifolium. cv “Hangju” is an important medicinal material with many functions in China. Flavonoids as the main secondary metabolites are a major class of medicinal components in “Hangju” and its composition and content can change significantly after flooding. This study mimicked the flooding stress of “Hangju” during flower bud differentiation and detected its metabolites in different growth stages. From widely targeted metabolomics data, 661 metabolites were detected, of which 46 differential metabolites exist simultaneously in the different growth stages of “Hangju”. The top three types of the 46 differential metabolites were flavone C-glycosides, flavonol and flavone. Our results demonstrated that the accumulation of flavonoids in different growth stages of “Hangju” was different; however, quercetin, eriodictyol and most of the flavone C-glycosides were significantly enhanced in the two stages after flooding stress. The expression of key enzyme genes in the flavonoid synthesis pathway were determined using RT-qPCR, which verified the consistency of the expression levels of CHI, F3H, DFR and ANS with the content of the corresponding flavonoids. A regulatory network of flavonoid biosynthesis was established to illustrate that flooding stress can change the accumulation of flavonoids by affecting the expression of the corresponding key enzymes in the flavonoid synthesis pathway.


Sign in / Sign up

Export Citation Format

Share Document