scholarly journals MoDentify: a tool for phenotype-driven module identification in multilevel metabolomics networks

2018 ◽  
Author(s):  
Kieu Trinh Do ◽  
David J.N.-P. Rasp ◽  
Gabi Kastenmüller ◽  
Karsten Suhre ◽  
Jan Krumsiek

AbstractSummaryMetabolomics is an established tool to gain insights into (patho)physiological outcomes. Associations of metabolism with such outcomes are expected to span functional modules, which are defined as sets of correlating metabolites that are coordinately regulated. Moreover, these associations occur at different scales, from entire pathways to only a few metabolites, which is an aspect that has not been addressed by previous methods. Here we present MoDentify, a freely available R package to identify regulated modules in metabolomics networks at different layers of resolution. Importantly, MoDentify shows higher statistical power than classical association analysis. Moreover, the package offers direct visualization of results as interactive networks in Cytoscape. We present an application example using a complex, multifluid metabolomics dataset. Owing to its generic character, the method is widely applicable to any dataset with a phenotype variable, a data matrix, and optional pathway annotations.Availability and ImplementationMoDentify is freely available from GitHub: https://github.com/krumsiek/MoDentifyThe package vignette contains a detailed tutorial of the analysis [email protected]

2019 ◽  
Vol 35 (18) ◽  
pp. 3524-3526 ◽  
Author(s):  
Yonghui Dong ◽  
Liron Feldberg ◽  
Asaph Aharoni

Abstract Motivation The use of stable isotope labeling is highly advantageous for structure elucidation in metabolomics studies. However, computational tools dealing with multiple-precursor-based labeling studies are still missing. Hence, we developed Miso, an R package providing automated and efficient data analysis workflow to detect the complete repertoire of labeled molecules from multiple-precursor-based labeling experiments. Results The capability of Miso is demonstrated by the analysis of liquid chromatography-mass spectrometry data obtained from duckweed plants fed with one unlabeled and two differently labeled tyrosine (unlabeled tyrosine, tyrosine-2H4 and tyrosine-13C915N1). The resulting data matrix generated by Miso contains sets of unlabeled and labeled ions with their retention time, m/z values and number of labeled atoms that can be directly utilized for database query and biological studies. Availability and implementation Miso is publicly available on the CRAN repository (https://cran.r-project.org/web/packages/Miso). A reproducible case study and a detailed tutorial are available from GitHub (https://github.com/YonghuiDong/Miso_example). Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (21) ◽  
pp. 4356-4363 ◽  
Author(s):  
Gaëlle Lefort ◽  
Laurence Liaubet ◽  
Cécile Canlet ◽  
Patrick Tardivel ◽  
Marie-Christine Père ◽  
...  

Abstract Motivation In metabolomics, the detection of new biomarkers from Nuclear Magnetic Resonance (NMR) spectra is a promising approach. However, this analysis remains difficult due to the lack of a whole workflow that handles spectra pre-processing, automatic identification and quantification of metabolites and statistical analyses, in a reproducible way. Results We present ASICS, an R package that contains a complete workflow to analyse spectra from NMR experiments. It contains an automatic approach to identify and quantify metabolites in a complex mixture spectrum and uses the results of the quantification in untargeted and targeted statistical analyses. ASICS was shown to improve the precision of quantification in comparison to existing methods on two independent datasets. In addition, ASICS successfully recovered most metabolites that were found important to explain a two level condition describing the samples by a manual and expert analysis based on bucketing. It also found new relevant metabolites involved in metabolic pathways related to risk factors associated with the condition. Availability and implementation ASICS is distributed as an R package, available on Bioconductor. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Ian D. Buller ◽  
Derek W. Brown ◽  
Timothy A. Myers ◽  
Rena R. Jones ◽  
Mitchell J. Machiela

Abstract Background Cancer epidemiology studies require sufficient power to assess spatial relationships between exposures and cancer incidence accurately. However, methods for power calculations of spatial statistics are complicated and underdeveloped, and therefore underutilized by investigators. The spatial relative risk function, a cluster detection technique that detects spatial clusters of point-level data for two groups (e.g., cancer cases and controls, two exposure groups), is a commonly used spatial statistic but does not have a readily available power calculation for study design. Results We developed sparrpowR as an open-source R package to estimate the statistical power of the spatial relative risk function. sparrpowR generates simulated data applying user-defined parameters (e.g., sample size, locations) to detect spatial clusters with high statistical power. We present applications of sparrpowR that perform a power calculation for a study designed to detect a spatial cluster of incident cancer in relation to a point source of numerous environmental emissions. The conducted power calculations demonstrate the functionality and utility of sparrpowR to calculate the local power for spatial cluster detection. Conclusions sparrpowR improves the current capacity of investigators to calculate the statistical power of spatial clusters, which assists in designing more efficient studies. This newly developed R package addresses a critically underdeveloped gap in cancer epidemiology by estimating statistical power for a common spatial cluster detection technique.


2020 ◽  
Vol 45 (4) ◽  
pp. 446-474
Author(s):  
Zuchao Shen ◽  
Benjamin Kelcey

Conventional optimal design frameworks consider a narrow range of sampling cost structures that thereby constrict their capacity to identify the most powerful and efficient designs. We relax several constraints of previous optimal design frameworks by allowing for variable sampling costs in cluster-randomized trials. The proposed framework introduces additional design considerations and has the potential to identify designs with more statistical power, even when some parameters are constrained due to immutable practical concerns. The results also suggest that the gains in efficiency introduced through the expanded framework are fairly robust to misspecifications of the expanded cost structure and concomitant design parameters (e.g., intraclass correlation coefficient). The proposed framework is implemented in the R package odr.


2020 ◽  
Vol 10 ◽  
Author(s):  
Jiafeng Zheng ◽  
Tongqiang Zhang ◽  
Wei Guo ◽  
Caili Zhou ◽  
Xiaojian Cui ◽  
...  

BackgroundAcute myelogenous leukemia (AML) is a common pediatric malignancy in children younger than 15 years old. Although the overall survival (OS) has been improved in recent years, the mechanisms of AML remain largely unknown. Hence, the purpose of this study is to explore the differentially methylated genes and to investigate the underlying mechanism in AML initiation and progression based on the bioinformatic analysis.MethodsMethylation array data and gene expression data were obtained from TARGET Data Matrix. The consensus clustering analysis was performed using ConsensusClusterPlus R package. The global DNA methylation was analyzed using methylationArrayAnalysis R package and differentially methylated genes (DMGs), and differentially expressed genes (DEGs) were identified using Limma R package. Besides, the biological function was analyzed using clusterProfiler R package. The correlation between DMGs and DEGs was determined using psych R package. Moreover, the correlation between DMGs and AML was assessed using varElect online tool. And the overall survival and progression-free survival were analyzed using survival R package.ResultsAll AML samples in this study were divided into three clusters at k = 3. Based on consensus clustering, we identified 1,146 CpGs, including 40 hypermethylated and 1,106 hypomethylated CpGs in AML. Besides, a total 529 DEGs were identified, including 270 upregulated and 259 downregulated DEGs in AML. The function analysis showed that DEGs significantly enriched in AML related biological process. Moreover, the correlation between DMGs and DEGs indicated that seven DMGs directly interacted with AML. CD34, HOXA7, and CD96 showed the strongest correlation with AML. Further, we explored three CpG sites cg03583857, cg26511321, cg04039397 of CD34, HOXA7, and CD96 which acted as the clinical prognostic biomarkers.ConclusionOur study identified three novel methylated genes in AML and also explored the mechanism of methylated genes in AML. Our finding may provide novel potential prognostic markers for AML.


Author(s):  
Marne C Hagemeijer ◽  
Annelotte M Vonk ◽  
Nikhil T Awatade ◽  
Iris A L Silva ◽  
Christian Tischer ◽  
...  

Abstract Motivation The forskolin-induced swelling (FIS) assay has become the preferential assay to predict the efficacy of approved and investigational CFTR-modulating drugs for individuals with cystic fibrosis (CF). Currently, no standardized quantification method of FIS data exists thereby hampering inter-laboratory reproducibility. Results We developed a complete open-source workflow for standardized high-content analysis of CFTR function measurements in intestinal organoids using raw microscopy images as input. The workflow includes tools for (i) file and metadata handling; (ii) image quantification and (iii) statistical analysis. Our workflow reproduced results generated by published proprietary analysis protocols and enables standardized CFTR function measurements in CF organoids. Availability All workflow components are open-source and freely available: the htmrenamer R package for file handling https://github.com/hmbotelho/htmrenamer; CellProfiler and ImageJ analysis scripts/pipelines https://github.com/hmbotelho/FIS_image_analysis; the Organoid Analyst application for statistical analysis https://github.com/hmbotelho/organoid_analyst; detailed usage instructions and a demonstration dataset https://github.com/hmbotelho/FIS_analysis. Distributed under GPL v3.0. Supplementary information Supplementary information and a stepwise guide for software installation and data analysis for training purposes are available at Bioinformatics online.


2019 ◽  
Author(s):  
Andrew J. Bass ◽  
John D. Storey

Analysis of biological data often involves the simultaneous testing of thousands of genes. This requires two key steps: the ranking of genes and the selection of important genes based on a significance threshold. One such testing procedure, called the "optimal discovery procedure" (ODP), leverages information across different tests to provide an optimal ranking of genes. This approach can lead to substantial improvements in statistical power compared to other methods. However, current applications of the ODP have only been established for simple study designs using microarray technology. Here we extend this work to the analysis of complex study designs and RNA sequencing studies. We then apply our extended framework to a static RNA sequencing study, a longitudinal and an independent sampling time-series study, and an independent sampling dose-response study. We find that our method shows improved performance compared to other testing procedures, finding more differentially expressed genes and increasing power for enrichment analysis. Thus the extended ODP enables a superior significance analysis of genomic studies. The algorithm is implemented in our freely available R package called edge.


2021 ◽  
Author(s):  
Himel Mallick ◽  
Suvo Chatterjee ◽  
Shrabanti Chowdhury ◽  
Saptarshi Chatterjee ◽  
Ali Rahnavard ◽  
...  

SummaryThe performance of computational methods and software to identify differentially expressed genes in single-cell RNA-sequencing (scRNA-seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA-seq data. Here, we propose to use generalized linear models with the Tweedie distribution that can flexibly capture a large dynamic range of observed scRNA-seq data across experimental platforms induced by heavy tails, sparsity, or different count distributions to model the technological variability in scRNA-seq expression profiles. We also propose a zero-inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero-inflated scRNA-seq data with excessive zero counts. Using both synthetic and published plate- and droplet-based scRNA-seq datasets, we performed a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state-of-the-art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open-source software (R package) is available at https://github.com/himelmallick/Tweedieverse.


2021 ◽  
Author(s):  
Maximilian Maier ◽  
Daniel Lakens

The default use of an alpha level of 0.05 is suboptimal for two reasons. First, decisions based on data can be made more efficiently by choosing an alpha level that minimizes the combined Type 1 and Type 2 error rate. Second, it is possible that in studies with very high statistical power p-values lower than the alpha level can be more likely when the null hypothesis is true, than when the alternative hypothesis is true (i.e., Lindley's paradox). This manuscript explains two approaches that can be used to justify a better choice of an alpha level than relying on the default threshold of 0.05. The first approach is based on the idea to either minimize or balance Type 1 and Type 2 error rates. The second approach lowers the alpha level as a function of the sample size to prevent Lindley's paradox. An R package and Shiny app are provided to perform the required calculations. Both approaches have their limitations (e.g., the challenge of specifying relative costs and priors), but can offer an improvement to current practices, especially when sample sizes are large. The use of alpha levels that have a better justification should improve statistical inferences and can increase the efficiency and informativeness of scientific research.


Sign in / Sign up

Export Citation Format

Share Document