Systematic evaluation of normalization methods for glycomics data based on performance of network inference

AbstractGlycomics measurements, like all other high-throughput technologies, are subject to technical variation due to fluctuations in the experimental conditions. The removal of this non-biological signal from the data is referred to as normalization. Contrary to other omics data types, a systematic evaluation of normalization options for glycomics data has not been published so far. In this paper, we assess the quality of different normalization strategies for glycomics data with an innovative approach. It has been shown previously that Gaussian Graphical Models (GGMs) inferred from glycomics data are able to identify enzymatic steps in the glycan synthesis pathways in a data-driven fashion. Based on this finding, we here quantify the quality of a given normalization method according to how well a GGM inferred from the respective normalized data reconstructs known synthesis reactions in the glycosylation pathway. The method therefore exploits a biological measure of goodness. We analyzed 23 different normalization combinations applied to six large-scale glycomics cohorts across three experimental platforms (LC-ESI-MS, UHPLC-FLD and MALDI-FTICR-MS). Based on our results, we recommend normalizing glycan data using the ‘Probabilistic Quotient’ method followed by log-transformation, irrespective of the measurement platform.

Download Full-text

Systematic Evaluation of Normalization Methods for Glycomics Data Based on Performance of Network Inference

Metabolites ◽

10.3390/metabo10070271 ◽

2020 ◽

Vol 10 (7) ◽

pp. 271 ◽

Cited By ~ 1

Author(s):

Elisa Benedetti ◽

Nathalie Gerstner ◽

Maja Pučić-Baković ◽

Toma Keser ◽

Karli R. Reiding ◽

...

Keyword(s):

Mass Spectrometry ◽

Liquid Chromatography ◽

Graphical Models ◽

Large Scale ◽

Network Inference ◽

Systematic Evaluation ◽

Ionization Mass ◽

Data Types ◽

Normalization Methods

Glycomics measurements, like all other high-throughput technologies, are subject to technical variation due to fluctuations in the experimental conditions. The removal of this non-biological signal from the data is referred to as normalization. Contrary to other omics data types, a systematic evaluation of normalization options for glycomics data has not been published so far. In this paper, we assess the quality of different normalization strategies for glycomics data with an innovative approach. It has been shown previously that Gaussian Graphical Models (GGMs) inferred from glycomics data are able to identify enzymatic steps in the glycan synthesis pathways in a data-driven fashion. Based on this finding, here, we quantify the quality of a given normalization method according to how well a GGM inferred from the respective normalized data reconstructs known synthesis reactions in the glycosylation pathway. The method therefore exploits a biological measure of goodness. We analyzed 23 different normalization combinations applied to six large-scale glycomics cohorts across three experimental platforms: Liquid Chromatography-ElectroSpray Ionization-Mass Spectrometry (LC-ESI-MS), Ultra High Performance Liquid Chromatography with Fluorescence Detection (UHPLC-FLD), and Matrix Assisted Laser Desorption Ionization-Furier Transform Ion Cyclotron Resonance-Mass Spectrometry (MALDI-FTICR-MS). Based on our results, we recommend normalizing glycan data using the ‘Probabilistic Quotient’ method followed by log-transformation, irrespective of the measurement platform. This recommendation is further supported by an additional analysis, where we ranked normalization methods based on their statistical associations with age, a factor known to associate with glycomics measurements.

Download Full-text

On Scale-Free Prior Distributions and Their Applicability in Large-Scale Network Inference with Gaussian Graphical Models

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Complex Sciences ◽

10.1007/978-3-642-02466-5_9 ◽

2009 ◽

pp. 110-117 ◽

Cited By ~ 1

Author(s):

Paul Sheridan ◽

Takeshi Kamimura ◽

Hidetoshi Shimodaira

Keyword(s):

Graphical Models ◽

Large Scale ◽

Network Inference ◽

Gaussian Graphical Models ◽

Prior Distributions ◽

Scale Free ◽

Large Scale Network ◽

Scale Network

Download Full-text

Multiscale Gaussian Graphical Models and Algorithms for Large-Scale Inference

2007 IEEE/SP 14th Workshop on Statistical Signal Processing ◽

10.1109/ssp.2007.4301253 ◽

2007 ◽

Cited By ~ 10

Author(s):

Myung Jin Choi ◽

Alan S. Willsky

Keyword(s):

Graphical Models ◽

Large Scale ◽

Gaussian Graphical Models

Download Full-text

Gene Regulation Network Inference With Joint Sparse Gaussian Graphical Models

Journal of Computational and Graphical Statistics ◽

10.1080/10618600.2014.956876 ◽

2015 ◽

Vol 24 (4) ◽

pp. 954-974 ◽

Cited By ~ 7

Author(s):

Hyonho Chun ◽

Xianghua Zhang ◽

Hongyu Zhao

Keyword(s):

Gene Regulation ◽

Graphical Models ◽

Network Inference ◽

Gaussian Graphical Models ◽

Gene Regulation Network ◽

Regulation Network

Download Full-text

Adding Extra Knowledge in Scalable Learning of Sparse Differential Gaussian Graphical Models

10.1101/716852 ◽

2019 ◽

Cited By ~ 1

Author(s):

Arshdeep Sekhon ◽

Beilun Wang ◽

Yanjun Qi

Keyword(s):

Graphical Models ◽

Large Scale ◽

Scale Up ◽

Gaussian Graphical Models ◽

Edge Information ◽

Group Knowledge ◽

Scalable Learning ◽

Novel Method ◽

Synthetic Datasets ◽

Brain Data

AbstractWe focus on integrating different types of extra knowledge (other than the observed samples) for estimating the sparse structure change between two p-dimensional Gaussian Graphical Models (i.e. differential GGMs). Previous differential GGM estimators either fail to include additional knowledge or cannot scale up to a high-dimensional (large p) situation. This paper proposes a novel method KDiffNet that incorporates Additional Knowledge in identifying Differential Networks via an Elementary Estimator. We design a novel hybrid norm as a superposition of two structured norms guided by the extra edge information and the additional node group knowledge. KDiffNet is solved through a fast parallel proximal algorithm, enabling it to work in large-scale settings. KDiffNet can incorporate various combinations of existing knowledge without re-designing the optimization. Through rigorous statistical analysis we show that, while considering more evidence, KDiffNet achieves the same convergence rate as the state-of-the-art. Empirically on multiple synthetic datasets and one real-world fMRI brain data, KDiffNet significantly outperforms the cutting edge baselines with regard to the prediction performance, while achieving the same level of time cost or less.

Download Full-text

Network Inference in Breast Cancer with Gaussian Graphical Models and Extensions

Probabilistic Graphical Models for Genetics, Genomics, and Postgenomics ◽

10.1093/acprof:oso/9780198709022.003.0005 ◽

2014 ◽

pp. 121-146

Author(s):

Marine Jeanmougin ◽

Camille Charbonnier ◽

Mickaël Guedj ◽

Julien Chiquet

Keyword(s):

Breast Cancer ◽

Graphical Models ◽

Network Inference ◽

Gaussian Graphical Models

Download Full-text

Scalable Estimator for Multi-task Gaussian Graphical Models Based in an IoT Network

ACM Transactions on Sensor Networks ◽

10.1145/3432312 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-33

Author(s):

Beilun Wang ◽

Jiaqi Zhang ◽

Yan Zhang ◽

Meng Wang ◽

Sen Wang

Keyword(s):

Graphical Models ◽

Large Scale ◽

Graphical Model ◽

Rapid Development ◽

Heterogeneous Data ◽

Joint Estimation ◽

Gaussian Graphical Models ◽

Novel Approach ◽

Cloud Server ◽

Iot Devices

Recently, the Internet of Things (IoT) receives significant interest due to its rapid development. But IoT applications still face two challenges: heterogeneity and large scale of IoT data. Therefore, how to efficiently integrate and process these complicated data becomes an essential problem. In this article, we focus on the problem that analyzing variable dependencies of data collected from different edge devices in the IoT network. Because data from different devices are heterogeneous and the variable dependencies can be characterized into a graphical model, we can focus on the problem that jointly estimating multiple, high-dimensional, and sparse Gaussian Graphical Models for many related tasks (edge devices). This is an important goal in many fields. Many IoT networks have collected massive multi-task data and require the analysis of heterogeneous data in many scenarios. Past works on the joint estimation are non-distributed and involve computationally expensive and complex non-smooth optimizations. To address these problems, we propose a novel approach: Multi-FST. Multi-FST can be efficiently implemented on a cloud-server-based IoT network. The cloud server has a low computational load and IoT devices use asynchronous communication with the server, leading to efficiency. Multi-FST shows significant improvement, over baselines, when tested on various datasets.

Download Full-text

Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data

npj Systems Biology and Applications ◽

10.1038/s41540-020-00154-6 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Deniz Seçilmiş ◽

Thomas Hillerton ◽

Daniel Morgan ◽

Andreas Tjärnberg ◽

Sven Nelander ◽

...

Keyword(s):

Regulatory Network ◽

Large Scale ◽

Network Inference ◽

Signal To Noise Ratio ◽

Selection Criterion ◽

Expression Data ◽

Signal To Noise ◽

Regulatory Interactions ◽

Gene Regulatory

Abstract The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where ~1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and potential novel cancer-related regulatory interactions were identified.

Download Full-text

A protocol to evaluate RNA sequencing normalization methods

BMC Bioinformatics ◽

10.1186/s12859-019-3247-x ◽

2019 ◽

Vol 20 (S24) ◽

Cited By ~ 8

Author(s):

Zachary B. Abrams ◽

Travis S. Johnson ◽

Kun Huang ◽

Philip R. O. Payne ◽

Kevin Coombes

Keyword(s):

Rna Sequencing ◽

Large Scale ◽

Systematic Evaluation ◽

Sequencing Data ◽

Data Set ◽

Standard Data ◽

Sequencing Technologies ◽

Large Scale Data ◽

Normalization Methods ◽

Rnaseq Data

Abstract Background RNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease. However, sequencing technologies often unintentionally introduce experimental error into RNA sequencing data. To counteract this, normalization methods are standardly applied with the intent of reducing the non-biologically derived variability inherent in transcriptomic measurements. However, the comparative efficacy of the various normalization techniques has not been tested in a standardized manner. Here we propose tests that evaluate numerous normalization techniques and applied them to a large-scale standard data set. These tests comprise a protocol that allows researchers to measure the amount of non-biological variability which is present in any data set after normalization has been performed, a crucial step to assessing the biological validity of data following normalization. Results In this study we present two tests to assess the validity of normalization methods applied to a large-scale data set collected for systematic evaluation purposes. We tested various RNASeq normalization procedures and concluded that transcripts per million (TPM) was the best performing normalization method based on its preservation of biological signal as compared to the other methods tested. Conclusion Normalization is of vital importance to accurately interpret the results of genomic and transcriptomic experiments. More work, however, needs to be performed to optimize normalization methods for RNASeq data. The present effort helps pave the way for more systematic evaluations of normalization methods across different platforms. With our proposed schema researchers can evaluate their own or future normalization methods to further improve the field of RNASeq normalization.

Download Full-text

Assessment of quality of life in patients with the effects of transient ischemic stroke

East European Journal of Neurology ◽

10.33444/2411-5797.2016.6(12).37-39 ◽

2016 ◽

pp. 37-39

Author(s):

A. Babirad

Keyword(s):

Quality Of Life ◽

Ischemic Stroke ◽

Large Scale ◽

Cerebrovascular Diseases ◽

Control Measures ◽

Sociological Research ◽

Cerebrovascular Events ◽

Sf 36 ◽

Angioplasty And Stenting

Cerebrovascular diseases are a problem of the world today, and according to the forecast, the problem of the near future arises. The main risk factors for the development of ischemic disorders of the cerebral circulation include oblique and aging, arterial hypertension, smoking, diabetes mellitus and heart disease. An effective strategy for the prevention of cerebrovascular events is based on the implementation of large-scale risk control measures, including the use of antiagregant and anticoagulant therapy, invasive interventions such as atheromectomy, angioplasty and stenting. In this connection, the efforts of neurologists, cardiologists, angiosurgery, endocrinologists and other specialists are the basis for achieving an acceptable clinical outcome. A review of the SF-36 method for assessing the quality of life in patients with the effects of transient ischemic stroke is presented. The assessment of quality of life is recognized in world medical practice and research, an indicator that is also used to assess the quality of the health system and in general sociological research.

Download Full-text