scholarly journals The ENmix DNA methylation analysis pipeline for Illumina BeadChip and comparisons with seven other preprocessing pipelines

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Zongli Xu ◽  
Liang Niu ◽  
Jack A. Taylor

Abstract Background Illumina DNA methylation arrays are high-throughput platforms for cost-effective genome-wide profiling of individual CpGs. Experimental and technical factors introduce appreciable measurement variation, some of which can be mitigated by careful “preprocessing” of raw data. Methods Here we describe the ENmix preprocessing pipeline and compare it to a set of seven published alternative pipelines (ChAMP, Illumina, SWAN, Funnorm, Noob, wateRmelon, and RnBeads). We use two large sets of duplicate sample measurements with 450 K and EPIC arrays, along with mixtures of isogenic methylated and unmethylated cell line DNA to compare raw data and that preprocessed via different pipelines. Results Our evaluations show that the ENmix pipeline performs the best with significantly higher correlation and lower absolute difference between duplicate pairs, higher intraclass correlation coefficients (ICC) and smaller deviations from expected methylation level in mixture experiments. In addition to the pipeline function, ENmix software provides an integrated set of functions for reading in raw data files from mouse and human arrays, quality control, data preprocessing, visualization, detection of differentially methylated regions (DMRs), estimation of cell type proportions, and calculation of methylation age clocks. ENmix is computationally efficient, flexible and allows parallel computing. To facilitate further evaluations, we make all datasets and evaluation code publicly available. Conclusion Careful selection of robust data preprocessing methods is critical for DNA methylation array studies. ENmix outperformed other pipelines in our evaluations to minimize experimental variation and to improve data quality and study power.

2021 ◽  
Author(s):  
Zhuang Xiong ◽  
Mengwei Li ◽  
Yingke Ma ◽  
Rujiao Li ◽  
Yiming Bao

Illumina HumanMethylation BeadChip is one of the most cost-effective ways to quantify DNA methylation levels at the single-base level across the human genome, which makes it a routine platform for epigenome-wide association studies. It has accumulated tens of thousands of DNA methylation array samples in public databases, thus provide great support for data integration and further analysis. However, majority of public DNA methylation data are deposited as processed data without background probes which are widely used in data normalization. Here we present Gaussian mixture quantile normalization (GMQN), a reference based method for correcting batch effects as well as probes bias in HumanMethylation BeadChip. Availability and implementation: https://github.com/MengweiLi-project/gmqn.


2022 ◽  
Vol 12 ◽  
Author(s):  
Zhuang Xiong ◽  
Mengwei Li ◽  
Yingke Ma ◽  
Rujiao Li ◽  
Yiming Bao

The Illumina HumanMethylation BeadChip is one of the most cost-effective methods to quantify DNA methylation levels at single-base resolution across the human genome, which makes it a routine platform for epigenome-wide association studies. It has accumulated tens of thousands of DNA methylation array samples in public databases, providing great support for data integration and further analysis. However, the majority of public DNA methylation data are deposited as processed data without background probes which are widely used in data normalization. Here, we present Gaussian mixture quantile normalization (GMQN), a reference based method for correcting batch effects as well as probe bias in the HumanMethylation BeadChip. Availability and implementation: https://github.com/MengweiLi-project/gmqn.


2019 ◽  
Vol 48 (D1) ◽  
pp. D890-D895 ◽  
Author(s):  
Zhuang Xiong ◽  
Mengwei Li ◽  
Fei Yang ◽  
Yingke Ma ◽  
Jian Sang ◽  
...  

Abstract Epigenome-Wide Association Study (EWAS) has become an effective strategy to explore epigenetic basis of complex traits. Over the past decade, a large amount of epigenetic data, especially those sourced from DNA methylation array, has been accumulated as the result of numerous EWAS projects. We present EWAS Data Hub (https://bigd.big.ac.cn/ewas/datahub), a resource for collecting and normalizing DNA methylation array data as well as archiving associated metadata. The current release of EWAS Data Hub integrates a comprehensive collection of DNA methylation array data from 75 344 samples and employs an effective normalization method to remove batch effects among different datasets. Accordingly, taking advantages of both massive high-quality DNA methylation data and standardized metadata, EWAS Data Hub provides reference DNA methylation profiles under different contexts, involving 81 tissues/cell types (that contain 25 brain parts and 25 blood cell types), six ancestry categories, and 67 diseases (including 39 cancers). In summary, EWAS Data Hub bears great promise to aid the retrieval and discovery of methylation-based biomarkers for phenotype characterization, clinical treatment and health care.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii400-iii401
Author(s):  
Kuo-Sheng Wu ◽  
Tai-Tong Wong

Abstract BACKGROUND Medulloblastoma (MB) was classified to 4 molecular subgroups: WNT, SHH, group 3 (G3), and group 4 (G4) with the demographic and clinical differences. In 2017, The heterogeneity within MB was proposed, and 12 subtypes with distinct molecular and clinical characteristics. PATIENTS AND METHODS: PATIENTS AND METHODS We retrieved 52 MBs in children to perform RNA-Seq and DNA methylation array. Subtype cluster analysis performed by similarity network fusion (SNF) method. With clinical results and molecular profiles, the characteristics including age, gender, histological variants, tumor location, metastasis status, survival, cytogenetic and genetic aberrations among MB subtypes were identified. RESULTS In this cohort series, 52 childhood MBs were classified into 11 subtypes by SNF cluster analysis. WNT tumors shown no metastasis and 100% survival rate. All WNT tumors located on midline in 4th ventricle. Monosomy 6 presented in WNT α, but not in β subtype. SHH α and β occurred in children, while SHH γ in infant. Among SHH tumors, α subtype showed the worst outcome. G3 γ showed the highest metastatic rate and worst survival associated with MYC amplification. G4 α has the highest metastatic rate, however G4 γ showed the worst survival. CONCLUSION We identified molecular subgroups and subtypes of MBs based on gene expression and DNA methylation profile in children in our cohort series. The results may contribute to the establishment of nation-wide correlated optimal diagnosis and treatment strategies for MBs in infant and children.


Author(s):  
Marina Bibikova ◽  
Bret Barnes ◽  
Chan Tsan ◽  
Vincent Ho ◽  
Brandy Klotzle ◽  
...  

Oncotarget ◽  
2016 ◽  
Vol 7 (39) ◽  
pp. 64191-64202 ◽  
Author(s):  
Qiuqiong Tang ◽  
Tim Holland-Letz ◽  
Alla Slynko ◽  
Katarina Cuk ◽  
Frederik Marme ◽  
...  

2020 ◽  
Vol 19 (12) ◽  
pp. 2125-2138 ◽  
Author(s):  
Peter Lasch ◽  
Andy Schneider ◽  
Christian Blumenscheit ◽  
Joerg Doellinger

Over the past decade, modern methods of MS (MS) have emerged that allow reliable, fast and cost-effective identification of pathogenic microorganisms. Although MALDI-TOF MS has already revolutionized the way microorganisms are identified, recent years have witnessed also substantial progress in the development of liquid chromatography (LC)-MS based proteomics for microbiological applications. For example, LC-tandem MS (LC-MS2) has been proposed for microbial characterization by means of multiple discriminative peptides that enable identification at the species, or sometimes at the strain level. However, such investigations can be laborious and time-consuming, especially if the experimental LC-MS2 data are tested against sequence databases covering a broad panel of different microbiological taxa. In this proof of concept study, we present an alternative bottom-up proteomics method for microbial identification. The proposed approach involves efficient extraction of proteins from cultivated microbial cells, digestion by trypsin and LC–MS measurements. Peptide masses are then extracted from MS1 data and systematically tested against an in silico library of all possible peptide mass data compiled in-house. The library has been computed from the UniProt Knowledgebase covering Swiss-Prot and TrEMBL databases and comprises more than 12,000 strain-specific in silico profiles, each containing tens of thousands of peptide mass entries. Identification analysis involves computation of score values derived from correlation coefficients between experimental and strain-specific in silico peptide mass profiles and compilation of score ranking lists. The taxonomic positions of the microbial samples are then determined by using the best-matching database entries. The suggested method is computationally efficient – less than 2 mins per sample - and has been successfully tested by a test set of 39 LC-MS1 peak lists obtained from 19 different microbial pathogens. The proposed method is rapid, simple and automatable and we foresee wide application potential for future microbiological applications.


2013 ◽  
Vol 109 (6) ◽  
pp. 1394-1402 ◽  
Author(s):  
C S Wilhelm-Benartzi ◽  
D C Koestler ◽  
M R Karagas ◽  
J M Flanagan ◽  
B C Christensen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document