scholarly journals cnAnalysis450k: an R package for comparative analysis of 450k/EPIC Illumina methylation array derived copy number data

2017 ◽  
Vol 33 (15) ◽  
pp. 2266-2272 ◽  
Author(s):  
Maximilian Knoll ◽  
Jürgen Debus ◽  
Amir Abdollahi
2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii427-iii428
Author(s):  
Alan Mackay ◽  
Yura Grabovska ◽  
Matthew Clarke ◽  
Diana Carvalho ◽  
Sara Temelso ◽  
...  

Abstract Methylation array-based molecular profiling has redefined the classification of brain tumours and now forms an important part of their integrated diagnosis, providing both subgroup assignment and genome wide DNA copy number profiles. These latter data can be used to identify intragenic breakpoints which are frequently associated with structural variations resulting in therapeutically targetable oncogenic fusion genes. To systematically assess the landscape of these alterations, we combined publicly available methylation datasets resulting in a total of 5660 CNS tumours, around half paediatric, and including >1000 high grade glioma and DIPG. These were analysed by standard methodology (MNP, conumee), and intragenic breakpoint enrichment was compared within methylation subgroups, superfamilies, and tumours with no high-scoring classification. Benchmarking included sequence-verified cases such as infant hemispheric gliomas (IHG) with ALK(15%) and ROS1(7%) fusions, and pathognomic alterations associated with specific entities such as RELA-EPN, MYB-LGG and HGNET-MN1. We identified previously unreported enrichments of well-recognised fusion targets such as NTRK2in GBM_MID and NTRK3in DMG_K27 (both 5%), METin A_IDH / A_IDH_HG (3–5%), and FGFR1/3in GBM_G34 (8–9%). Novel recurrent kinase gene candidates to be verified and explored further include IGF1Rin 2–12% cases spanning glioma subgroups, and TIE1in poorly classified tumours. This latter ‘NOS’ group were also enriched in various transcription factor targets of breakpoints, including TCF4and PLAGL2. Despite limitations due to sample quality, resolution or balanced translocations, breakpoint analysis of methylation copy number profiles provides simple screening for structural rearrangements which may directly influence targeted therapy in paediatric CNS tumours.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jovana Maksimovic ◽  
Alicia Oshlack ◽  
Belinda Phipson

AbstractDNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalization, and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches, and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii389-iii389
Author(s):  
Rahul Kumar ◽  
Maximilian Deng ◽  
Kyle Smith ◽  
Anthony Liu ◽  
Girish Dhall ◽  
...  

Abstract INTRODUCTION The next generation of clinical trials for relapsed medulloblastoma demands a thorough understanding of the clinical behavior of relapsed tumors as well as the molecular relationship to their diagnostic counterparts. METHODS A multi-institutional molecular cohort of patient-matched (n=126 patients) diagnostic MBs and relapses/subsequent malignancies was profiled by DNA methylation array. Entity, subgroup classification, and genome-wide copy-number aberrations were assigned while parallel next-generation (whole-exome or targeted panel) sequencing on the majority of the cohort facilitated inference of somatic driver mutations. RESULTS Comprised of WNT (2%), SHH (41%), Group 3 (18%), Group 4 (39%), primary tumors retained subgroup affiliation at relapse with the notable exception of 10% of cases. The majority (8/13) of discrepant classifications were determined to be secondary glioblastomas. Additionally, rare (n=3) subgroup-switching events of Group 4 primary tumors to Group 3 relapses were identified coincident with MYC/MYCN pathway alterations. Amongst truly relapsing MBs, copy-number analyses suggest somatic clonal divergence between primary MBs and their respective relapses with Group 3 (55% of alterations shared) and Group 4 tumors (63% alterations shared) sharing a larger proportion of cytogenetic alterations compared to SHH tumors (42% alterations shared; Chi-square p-value < 0.001). Subgroup- and gene-specific patterns of conservation and divergence amongst putative driver genes were also observed. CONCLUSION Integrated molecular analysis of relapsed MB discloses potential mechanisms underlying treatment failure and disease recurrence while motivating rational implementation of relapse-specific therapies. The degree of genetic divergence between primary and relapsed MBs varied by subgroup but suggested considerably higher conservation than prior estimates.


2011 ◽  
Vol 10 ◽  
pp. CIN.S6873 ◽  
Author(s):  
Susann Stjernqvist ◽  
Tobias Rydén ◽  
Chris D. Greenman

SNP allelic copy number data provides intensity measurements for the two different alleles separately. We present a method that estimates the number of copies of each allele at each SNP position, using a continuous-index hidden Markov model. The method is especially suited for cancer data, since it includes the fraction of normal tissue contamination, often present when studying data from cancer tumors, into the model. The continuous-index structure takes into account the distances between the SNPs, and is thereby appropriate also when SNPs are unequally spaced. In a simulation study we show that the method performs favorably compared to previous methods even with as much as 70% normal contamination. We also provide results from applications to clinical data produced using the Affymetrix genome-wide SNP 6.0 platform.


2011 ◽  
Vol 27 (11) ◽  
pp. 1473-1480 ◽  
Author(s):  
Guoqiang Yu ◽  
Bai Zhang ◽  
G. Steven Bova ◽  
Jianfeng Xu ◽  
Ie−Ming Shih ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Jinghang Zhou ◽  
Liyuan Liu ◽  
Thomas J. Lopdell ◽  
Dorian J. Garrick ◽  
Yuangang Shi

Detection of CNVs (copy number variants) and ROH (runs of homozygosity) from SNP (single nucleotide polymorphism) genotyping data is often required in genomic studies. The post-analysis of CNV and ROH generally involves many steps, potentially across multiple computing platforms, which requires the researchers to be familiar with many different tools. In order to get around this problem and improve research efficiency, we present an R package that integrates the summarization, annotation, map conversion, comparison and visualization functions involved in studies of CNV and ROH. This one-stop post-analysis system is standardized, comprehensive, reproducible, timesaving, and user-friendly for researchers in humans and most diploid livestock species.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10849
Author(s):  
Maximilian Knoll ◽  
Jennifer Furkel ◽  
Juergen Debus ◽  
Amir Abdollahi

Background Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5–15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups. Methods Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2–10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation. Results V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3–10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54–1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28–1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59–1.00) for V1 and 0.54 (range 0.32–0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method. Conclusions The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR).


2014 ◽  
Vol 13s4 ◽  
pp. CIN.S13978
Author(s):  
Yen-Tsung Huang ◽  
Thomas Hsu ◽  
David C. Christiani

The effects of copy number alterations make up a significant part of the tumor genome profile, but pathway analyses of these alterations are still not well established. We proposed a novel method to analyze multiple copy numbers of genes within a pathway, termed Test for the Effect of a Gene Set with Copy Number data (TEGS-CN). TEGS-CN was adapted from TEGS, a method that we previously developed for gene expression data using a variance component score test. With additional development, we extend the method to analyze DNA copy number data, accounting for different sizes and thus various numbers of copy number probes in genes. The test statistic follows a mixture of X 2 distributions that can be obtained using permutation with scaled X 2 approximation. We conducted simulation studies to evaluate the size and the power of TEGS-CN and to compare its performance with TEGS. We analyzed a genome-wide copy number data from 264 patients of non-small-cell lung cancer. With the Molecular Signatures Database (MSigDB) pathway database, the genome-wide copy number data can be classified into 1814 biological pathways or gene sets. We investigated associations of the copy number profile of the 1814 gene sets with pack-years of cigarette smoking. Our analysis revealed five pathways with significant P values after Bonferroni adjustment (<2.8 x 10-5), including the PTEN pathway (7.8 x 10-7), the gene set up-regulated under heat shock (3.6 x 10-6), the gene sets involved in the immune profile for rejection of kidney transplantation (9.2 x 10-6) and for transcriptional control of leukocytes (2.2 x 10-5), and the ganglioside biosynthesis pathway (2.7 x 10-5). In conclusion, we present a new method for pathway analyses of copy number data, and causal mechanisms of the five pathways require further study.


Sign in / Sign up

Export Citation Format

Share Document