The ENmix DNA methylation analysis pipeline for Illumina BeadChip and comparisons with seven other preprocessing pipelines

Abstract Background Illumina DNA methylation arrays are high-throughput platforms for cost-effective genome-wide profiling of individual CpGs. Experimental and technical factors introduce appreciable measurement variation, some of which can be mitigated by careful “preprocessing” of raw data. Methods Here we describe the ENmix preprocessing pipeline and compare it to a set of seven published alternative pipelines (ChAMP, Illumina, SWAN, Funnorm, Noob, wateRmelon, and RnBeads). We use two large sets of duplicate sample measurements with 450 K and EPIC arrays, along with mixtures of isogenic methylated and unmethylated cell line DNA to compare raw data and that preprocessed via different pipelines. Results Our evaluations show that the ENmix pipeline performs the best with significantly higher correlation and lower absolute difference between duplicate pairs, higher intraclass correlation coefficients (ICC) and smaller deviations from expected methylation level in mixture experiments. In addition to the pipeline function, ENmix software provides an integrated set of functions for reading in raw data files from mouse and human arrays, quality control, data preprocessing, visualization, detection of differentially methylated regions (DMRs), estimation of cell type proportions, and calculation of methylation age clocks. ENmix is computationally efficient, flexible and allows parallel computing. To facilitate further evaluations, we make all datasets and evaluation code publicly available. Conclusion Careful selection of robust data preprocessing methods is critical for DNA methylation array studies. ENmix outperformed other pipelines in our evaluations to minimize experimental variation and to improve data quality and study power.

Download Full-text

GMQN: A reference-based method for correcting batch effects as well as probes bias in HumanMethylation BeadChip

10.1101/2021.09.06.459116 ◽

2021 ◽

Author(s):

Zhuang Xiong ◽

Mengwei Li ◽

Yingke Ma ◽

Rujiao Li ◽

Yiming Bao

Keyword(s):

Dna Methylation ◽

Association Studies ◽

Cost Effective ◽

Gaussian Mixture ◽

Data Normalization ◽

Batch Effects ◽

Methylation Array ◽

Base Level ◽

Great Support ◽

Dna Methylation Array

Illumina HumanMethylation BeadChip is one of the most cost-effective ways to quantify DNA methylation levels at the single-base level across the human genome, which makes it a routine platform for epigenome-wide association studies. It has accumulated tens of thousands of DNA methylation array samples in public databases, thus provide great support for data integration and further analysis. However, majority of public DNA methylation data are deposited as processed data without background probes which are widely used in data normalization. Here we present Gaussian mixture quantile normalization (GMQN), a reference based method for correcting batch effects as well as probes bias in HumanMethylation BeadChip. Availability and implementation: https://github.com/MengweiLi-project/gmqn.

Download Full-text

GMQN: A Reference-Based Method for Correcting Batch Effects and Probe Bias in HumanMethylation BeadChip

Frontiers in Genetics ◽

10.3389/fgene.2021.810985 ◽

2022 ◽

Vol 12 ◽

Author(s):

Zhuang Xiong ◽

Mengwei Li ◽

Yingke Ma ◽

Rujiao Li ◽

Yiming Bao

Keyword(s):

Dna Methylation ◽

Association Studies ◽

Cost Effective ◽

Gaussian Mixture ◽

Data Normalization ◽

Batch Effects ◽

Methylation Array ◽

Great Support ◽

Single Base Resolution ◽

Dna Methylation Array

The Illumina HumanMethylation BeadChip is one of the most cost-effective methods to quantify DNA methylation levels at single-base resolution across the human genome, which makes it a routine platform for epigenome-wide association studies. It has accumulated tens of thousands of DNA methylation array samples in public databases, providing great support for data integration and further analysis. However, the majority of public DNA methylation data are deposited as processed data without background probes which are widely used in data normalization. Here, we present Gaussian mixture quantile normalization (GMQN), a reference based method for correcting batch effects as well as probe bias in the HumanMethylation BeadChip. Availability and implementation: https://github.com/MengweiLi-project/gmqn.

Download Full-text

EWAS Data Hub: a resource of DNA methylation array data and metadata

Nucleic Acids Research ◽

10.1093/nar/gkz840 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D890-D895 ◽

Cited By ~ 6

Author(s):

Zhuang Xiong ◽

Mengwei Li ◽

Fei Yang ◽

Yingke Ma ◽

Jian Sang ◽

...

Keyword(s):

Dna Methylation ◽

Complex Traits ◽

Cell Types ◽

Great Promise ◽

Methylation Array ◽

Array Data ◽

The Past ◽

Comprehensive Collection ◽

Dna Methylation Array ◽

Brain Parts

Abstract Epigenome-Wide Association Study (EWAS) has become an effective strategy to explore epigenetic basis of complex traits. Over the past decade, a large amount of epigenetic data, especially those sourced from DNA methylation array, has been accumulated as the result of numerous EWAS projects. We present EWAS Data Hub (https://bigd.big.ac.cn/ewas/datahub), a resource for collecting and normalizing DNA methylation array data as well as archiving associated metadata. The current release of EWAS Data Hub integrates a comprehensive collection of DNA methylation array data from 75 344 samples and employs an effective normalization method to remove batch effects among different datasets. Accordingly, taking advantages of both massive high-quality DNA methylation data and standardized metadata, EWAS Data Hub provides reference DNA methylation profiles under different contexts, involving 81 tissues/cell types (that contain 25 brain parts and 25 blood cell types), six ancestry categories, and 67 diseases (including 39 cancers). In summary, EWAS Data Hub bears great promise to aid the retrieval and discovery of methylation-based biomarkers for phenotype characterization, clinical treatment and health care.

Download Full-text

MBRS-14. INTEGRATING CLINICAL AND GENOMIC CHARACTERISTICS IN PEDIATRIC MEDULLOBLASTOMA SUBTYPES IN A SINGLE COHORT IN TAIWAN

Neuro-Oncology ◽

10.1093/neuonc/noaa222.531 ◽

2020 ◽

Vol 22 (Supplement_3) ◽

pp. iii400-iii401

Author(s):

Kuo-Sheng Wu ◽

Tai-Tong Wong

Keyword(s):

Dna Methylation ◽

Cluster Analysis ◽

Treatment Strategies ◽

Clinical Results ◽

Tumor Location ◽

Molecular Subgroups ◽

Methylation Array ◽

Metastatic Rate ◽

Pediatric Medulloblastoma ◽

Dna Methylation Array

Abstract BACKGROUND Medulloblastoma (MB) was classified to 4 molecular subgroups: WNT, SHH, group 3 (G3), and group 4 (G4) with the demographic and clinical differences. In 2017, The heterogeneity within MB was proposed, and 12 subtypes with distinct molecular and clinical characteristics. PATIENTS AND METHODS: PATIENTS AND METHODS We retrieved 52 MBs in children to perform RNA-Seq and DNA methylation array. Subtype cluster analysis performed by similarity network fusion (SNF) method. With clinical results and molecular profiles, the characteristics including age, gender, histological variants, tumor location, metastasis status, survival, cytogenetic and genetic aberrations among MB subtypes were identified. RESULTS In this cohort series, 52 childhood MBs were classified into 11 subtypes by SNF cluster analysis. WNT tumors shown no metastasis and 100% survival rate. All WNT tumors located on midline in 4th ventricle. Monosomy 6 presented in WNT α, but not in β subtype. SHH α and β occurred in children, while SHH γ in infant. Among SHH tumors, α subtype showed the worst outcome. G3 γ showed the highest metastatic rate and worst survival associated with MYC amplification. G4 α has the highest metastatic rate, however G4 γ showed the worst survival. CONCLUSION We identified molecular subgroups and subtypes of MBs based on gene expression and DNA methylation profile in children in our cohort series. The results may contribute to the establishment of nation-wide correlated optimal diagnosis and treatment strategies for MBs in infant and children.

Download Full-text

Model-Based Clustering of DNA Methylation Array Data

Translational Bioinformatics - Computational and Statistical Epigenomics ◽

10.1007/978-94-017-9927-0_5 ◽

2015 ◽

pp. 91-123

Author(s):

Devin C. Koestler ◽

E. Andrés Houseman

Keyword(s):

Dna Methylation ◽

Methylation Array ◽

Array Data ◽

Model Based Clustering ◽

Model Based ◽

Dna Methylation Array

Download Full-text

Abstract LB-176: A novel high density DNA methylation array with single CpG site resolution

10.1158/1538-7445.am2011-lb-176 ◽

2011 ◽

Cited By ~ 3

Author(s):

Marina Bibikova ◽

Bret Barnes ◽

Chan Tsan ◽

Vincent Ho ◽

Brandy Klotzle ◽

...

Keyword(s):

Dna Methylation ◽

High Density ◽

Methylation Array ◽

Cpg Site ◽

Dna Methylation Array

Download Full-text

DNA methylation array analysis identifies breast cancer associated RPTOR, MGRN1 and RAPSN hypomethylation in peripheral blood DNA

Oncotarget ◽

10.18632/oncotarget.11640 ◽

2016 ◽

Vol 7 (39) ◽

pp. 64191-64202 ◽

Cited By ~ 19

Author(s):

Qiuqiong Tang ◽

Tim Holland-Letz ◽

Alla Slynko ◽

Katarina Cuk ◽

Frederik Marme ◽

...

Keyword(s):

Breast Cancer ◽

Dna Methylation ◽

Peripheral Blood ◽

Array Analysis ◽

Methylation Array ◽

Dna Methylation Array

Download Full-text

Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) and in Silico Peptide Mass Libraries

Molecular & Cellular Proteomics ◽

10.1074/mcp.tir120.002061 ◽

2020 ◽

Vol 19 (12) ◽

pp. 2125-2138 ◽

Cited By ~ 1

Author(s):

Peter Lasch ◽

Andy Schneider ◽

Christian Blumenscheit ◽

Joerg Doellinger

Keyword(s):

Liquid Chromatography ◽

In Silico ◽

Correlation Coefficients ◽

Cost Effective ◽

Microbial Pathogens ◽

Computationally Efficient ◽

Microbial Identification ◽

Peptide Mass ◽

Substantial Progress ◽

Identification Analysis

Over the past decade, modern methods of MS (MS) have emerged that allow reliable, fast and cost-effective identification of pathogenic microorganisms. Although MALDI-TOF MS has already revolutionized the way microorganisms are identified, recent years have witnessed also substantial progress in the development of liquid chromatography (LC)-MS based proteomics for microbiological applications. For example, LC-tandem MS (LC-MS2) has been proposed for microbial characterization by means of multiple discriminative peptides that enable identification at the species, or sometimes at the strain level. However, such investigations can be laborious and time-consuming, especially if the experimental LC-MS2 data are tested against sequence databases covering a broad panel of different microbiological taxa. In this proof of concept study, we present an alternative bottom-up proteomics method for microbial identification. The proposed approach involves efficient extraction of proteins from cultivated microbial cells, digestion by trypsin and LC–MS measurements. Peptide masses are then extracted from MS1 data and systematically tested against an in silico library of all possible peptide mass data compiled in-house. The library has been computed from the UniProt Knowledgebase covering Swiss-Prot and TrEMBL databases and comprises more than 12,000 strain-specific in silico profiles, each containing tens of thousands of peptide mass entries. Identification analysis involves computation of score values derived from correlation coefficients between experimental and strain-specific in silico peptide mass profiles and compilation of score ranking lists. The taxonomic positions of the microbial samples are then determined by using the best-matching database entries. The suggested method is computationally efficient – less than 2 mins per sample - and has been successfully tested by a test set of 39 LC-MS1 peak lists obtained from 19 different microbial pathogens. The proposed method is rapid, simple and automatable and we foresee wide application potential for future microbiological applications.

Download Full-text