GMQN: A Reference-Based Method for Correcting Batch Effects and Probe Bias in HumanMethylation BeadChip

The Illumina HumanMethylation BeadChip is one of the most cost-effective methods to quantify DNA methylation levels at single-base resolution across the human genome, which makes it a routine platform for epigenome-wide association studies. It has accumulated tens of thousands of DNA methylation array samples in public databases, providing great support for data integration and further analysis. However, the majority of public DNA methylation data are deposited as processed data without background probes which are widely used in data normalization. Here, we present Gaussian mixture quantile normalization (GMQN), a reference based method for correcting batch effects as well as probe bias in the HumanMethylation BeadChip. Availability and implementation: https://github.com/MengweiLi-project/gmqn.

Download Full-text

GMQN: A reference-based method for correcting batch effects as well as probes bias in HumanMethylation BeadChip

10.1101/2021.09.06.459116 ◽

2021 ◽

Author(s):

Zhuang Xiong ◽

Mengwei Li ◽

Yingke Ma ◽

Rujiao Li ◽

Yiming Bao

Keyword(s):

Dna Methylation ◽

Association Studies ◽

Cost Effective ◽

Gaussian Mixture ◽

Data Normalization ◽

Batch Effects ◽

Methylation Array ◽

Base Level ◽

Great Support ◽

Dna Methylation Array

Illumina HumanMethylation BeadChip is one of the most cost-effective ways to quantify DNA methylation levels at the single-base level across the human genome, which makes it a routine platform for epigenome-wide association studies. It has accumulated tens of thousands of DNA methylation array samples in public databases, thus provide great support for data integration and further analysis. However, majority of public DNA methylation data are deposited as processed data without background probes which are widely used in data normalization. Here we present Gaussian mixture quantile normalization (GMQN), a reference based method for correcting batch effects as well as probes bias in HumanMethylation BeadChip. Availability and implementation: https://github.com/MengweiLi-project/gmqn.

Download Full-text

Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data

BMC Bioinformatics ◽

10.1186/s12859-019-3040-x ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Kipoong Kim ◽

Hokeun Sun

Keyword(s):

Dna Methylation ◽

Biological Network ◽

Association Studies ◽

Genetic Network ◽

High Dimensional ◽

Methylation Array ◽

Array Data ◽

Network Information ◽

Dna Methylation Array ◽

Control Association

Abstract Background In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network structures in terms of true positive selection. In recent epigenetic research on case-control association studies, relatively many statistical methods have been proposed to identify cancer-related CpG sites and their corresponding genes from high-dimensional DNA methylation array data. However, most of existing methods are not designed to utilize genetic network information although methylation levels between linked genes in the genetic networks tend to be highly correlated with each other. Results We propose new approach that combines data dimension reduction techniques with network-based regularization to identify outcome-related genes for analysis of high-dimensional DNA methylation data. In simulation studies, we demonstrated that the proposed approach overwhelms other statistical methods that do not utilize genetic network information in terms of true positive selection. We also applied it to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. Conclusions The proposed variable selection approach can utilize prior biological network information for analysis of high-dimensional DNA methylation array data. It first captures gene level signals from multiple CpG sites using data a dimension reduction technique and then performs network-based regularization based on biological network graph information. It can select potentially cancer-related genes and genetic pathways that were missed by the existing methods.

Download Full-text

The ENmix DNA methylation analysis pipeline for Illumina BeadChip and comparisons with seven other preprocessing pipelines

Clinical Epigenetics ◽

10.1186/s13148-021-01207-1 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Zongli Xu ◽

Liang Niu ◽

Jack A. Taylor

Keyword(s):

Dna Methylation ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Cost Effective ◽

Data Preprocessing ◽

Absolute Difference ◽

Computationally Efficient ◽

Methylation Array ◽

Raw Data ◽

Dna Methylation Array

Abstract Background Illumina DNA methylation arrays are high-throughput platforms for cost-effective genome-wide profiling of individual CpGs. Experimental and technical factors introduce appreciable measurement variation, some of which can be mitigated by careful “preprocessing” of raw data. Methods Here we describe the ENmix preprocessing pipeline and compare it to a set of seven published alternative pipelines (ChAMP, Illumina, SWAN, Funnorm, Noob, wateRmelon, and RnBeads). We use two large sets of duplicate sample measurements with 450 K and EPIC arrays, along with mixtures of isogenic methylated and unmethylated cell line DNA to compare raw data and that preprocessed via different pipelines. Results Our evaluations show that the ENmix pipeline performs the best with significantly higher correlation and lower absolute difference between duplicate pairs, higher intraclass correlation coefficients (ICC) and smaller deviations from expected methylation level in mixture experiments. In addition to the pipeline function, ENmix software provides an integrated set of functions for reading in raw data files from mouse and human arrays, quality control, data preprocessing, visualization, detection of differentially methylated regions (DMRs), estimation of cell type proportions, and calculation of methylation age clocks. ENmix is computationally efficient, flexible and allows parallel computing. To facilitate further evaluations, we make all datasets and evaluation code publicly available. Conclusion Careful selection of robust data preprocessing methods is critical for DNA methylation array studies. ENmix outperformed other pipelines in our evaluations to minimize experimental variation and to improve data quality and study power.

Download Full-text

Batch Effects and Pathway Analysis: Two Potential Perils in Cancer Studies Involving DNA Methylation Array Analysis

Cancer Epidemiology Biomarkers & Prevention ◽

10.1158/1055-9965.epi-13-0114 ◽

2013 ◽

Vol 22 (6) ◽

pp. 1052-1060 ◽

Cited By ~ 58

Author(s):

Kristin N. Harper ◽

Brandilyn A. Peters ◽

Mary V. Gamble

Keyword(s):

Dna Methylation ◽

Pathway Analysis ◽

Batch Effects ◽

Array Analysis ◽

Methylation Array ◽

Dna Methylation Array ◽

Cancer Studies

Download Full-text

EWAS Data Hub: a resource of DNA methylation array data and metadata

Nucleic Acids Research ◽

10.1093/nar/gkz840 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D890-D895 ◽

Cited By ~ 6

Author(s):

Zhuang Xiong ◽

Mengwei Li ◽

Fei Yang ◽

Yingke Ma ◽

Jian Sang ◽

...

Keyword(s):

Dna Methylation ◽

Complex Traits ◽

Cell Types ◽

Great Promise ◽

Methylation Array ◽

Array Data ◽

The Past ◽

Comprehensive Collection ◽

Dna Methylation Array ◽

Brain Parts

Abstract Epigenome-Wide Association Study (EWAS) has become an effective strategy to explore epigenetic basis of complex traits. Over the past decade, a large amount of epigenetic data, especially those sourced from DNA methylation array, has been accumulated as the result of numerous EWAS projects. We present EWAS Data Hub (https://bigd.big.ac.cn/ewas/datahub), a resource for collecting and normalizing DNA methylation array data as well as archiving associated metadata. The current release of EWAS Data Hub integrates a comprehensive collection of DNA methylation array data from 75 344 samples and employs an effective normalization method to remove batch effects among different datasets. Accordingly, taking advantages of both massive high-quality DNA methylation data and standardized metadata, EWAS Data Hub provides reference DNA methylation profiles under different contexts, involving 81 tissues/cell types (that contain 25 brain parts and 25 blood cell types), six ancestry categories, and 67 diseases (including 39 cancers). In summary, EWAS Data Hub bears great promise to aid the retrieval and discovery of methylation-based biomarkers for phenotype characterization, clinical treatment and health care.

Download Full-text

MBRS-14. INTEGRATING CLINICAL AND GENOMIC CHARACTERISTICS IN PEDIATRIC MEDULLOBLASTOMA SUBTYPES IN A SINGLE COHORT IN TAIWAN

Neuro-Oncology ◽

10.1093/neuonc/noaa222.531 ◽

2020 ◽

Vol 22 (Supplement_3) ◽

pp. iii400-iii401

Author(s):

Kuo-Sheng Wu ◽

Tai-Tong Wong

Keyword(s):

Dna Methylation ◽

Cluster Analysis ◽

Treatment Strategies ◽

Clinical Results ◽

Tumor Location ◽

Molecular Subgroups ◽

Methylation Array ◽

Metastatic Rate ◽

Pediatric Medulloblastoma ◽

Dna Methylation Array

Abstract BACKGROUND Medulloblastoma (MB) was classified to 4 molecular subgroups: WNT, SHH, group 3 (G3), and group 4 (G4) with the demographic and clinical differences. In 2017, The heterogeneity within MB was proposed, and 12 subtypes with distinct molecular and clinical characteristics. PATIENTS AND METHODS: PATIENTS AND METHODS We retrieved 52 MBs in children to perform RNA-Seq and DNA methylation array. Subtype cluster analysis performed by similarity network fusion (SNF) method. With clinical results and molecular profiles, the characteristics including age, gender, histological variants, tumor location, metastasis status, survival, cytogenetic and genetic aberrations among MB subtypes were identified. RESULTS In this cohort series, 52 childhood MBs were classified into 11 subtypes by SNF cluster analysis. WNT tumors shown no metastasis and 100% survival rate. All WNT tumors located on midline in 4th ventricle. Monosomy 6 presented in WNT α, but not in β subtype. SHH α and β occurred in children, while SHH γ in infant. Among SHH tumors, α subtype showed the worst outcome. G3 γ showed the highest metastatic rate and worst survival associated with MYC amplification. G4 α has the highest metastatic rate, however G4 γ showed the worst survival. CONCLUSION We identified molecular subgroups and subtypes of MBs based on gene expression and DNA methylation profile in children in our cohort series. The results may contribute to the establishment of nation-wide correlated optimal diagnosis and treatment strategies for MBs in infant and children.

Download Full-text