scholarly journals Filtering high-dimensional methylation marks with extremely small sample size: an application to gastric cancer data

Author(s):  
Xin Chen ◽  
Qingrun Zhang ◽  
Thierry Chekouo

Abstract Background: DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. Methods: BACkPAy is a pre-screening Bayesian approach to detect biological meaningful clusters of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e. non-differential) with flat methylation pattern levels accross experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with 3 tissue types and each type contains 3 gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Results: Using BACkPAy, we identified 8 biological meaningful clusters/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e. predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. Conclusions: We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1 and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.

2021 ◽  
Vol 12 ◽  
Author(s):  
Xin Chen ◽  
Qingrun Zhang ◽  
Thierry Chekouo

DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. BACkPAy is a pre-screening Bayesian approach to detect biological meaningful patterns of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e., non-differential) with flat methylation pattern levels across experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with three tissue types and each type contains three gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Using BACkPAy, we identified eight biological meaningful patterns/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e., predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1, and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.


Blood ◽  
2010 ◽  
Vol 116 (21) ◽  
pp. 608-608
Author(s):  
Matthew J. Walter ◽  
Dong Shen ◽  
Jin Shao ◽  
Li Ding ◽  
Marcus Grillot ◽  
...  

Abstract Abstract 608 Myelodysplastic syndrome (MDS) genomes are characterized by global DNA hypomethylation with concomitant hypermethylation of gene promoter regions compared to CD34+ cells from normal bone marrow samples. Currently, the underlying mechanism of altered DNA methylation in MDS genomes and the critical target genes affected by methylation remain largely unknown. The methylation of CpG dinucleotides in humans is mediated by DNA methyltransferases, including DNMT1, DNMT3A, and DNMT3B. DNMT3A and DNMT3B are the dominant DNA methyltransferases involved in de novo DNA methylation and act independent of replication, whereas DNMT1 acts predominantly during replication to maintain hemimethylated DNA. The function of these proteins in cancer cells is less well defined. Our group recently found that DNMT3A mutations are common in de novo acute myeloid leukemia (62/281 cases, 22%) and are associated with poor survival (Ley, et al, unpublished), providing a rationale for examining the mutation status of DNMT3A in MDS patients. MDS cases (n=150) were classified according to the French-American-British (FAB) system. The patients included refractory anemia (RA; n=67), RA with ringed sideroblasts (RARS; n=5), RA with excess blasts (RAEB; n=72), and RA with excess blasts in transformation (RAEB-T; n=6). The median International Prognostic Scoring System (IPSS) score was 1 (range 0–3), and the median myeloblast count was 4 (range 0–28%). We designed and validated 28 primer pairs covering the coding sequences and splice sites of all 23 exons for DNMT3A. Paired DNA samples were obtained from the bone marrow (tumor) and skin (normal) of each patient so that somatic mutations could be distinguished from inherited variants/polymorphisms. 17,120 reads were produced by capillary sequencing, providing at least 1X coverage for 82.6% of the target sequence (low/no coverage was obtained for 2 out of 28 amplicons). A semiautomated analysis pipeline was used to identify sequence variants and we restricted our analysis to nonsynonymous and splice site nucleotide changes. All mutations were confirmed by independent PCR and sequencing. We identified nonsynonymous DNMT3A mutations in 12/150 bone marrow samples (8% of cases). All the mutations were heterozygous (10 missense, 1 nonsense, 1 frameshift) and were computationally predicted (by SIFT and/or PolyPhen2) to have deleterious functional consequences. DNMT3A mRNA is expressed in normal CD34+ bone marrow cells and was expressed in all MDS patient samples tested (n=28), independent of mutation status. There was no difference in the expression level of total DNMT3A mRNA in CD34+ cells harvested from mutant (n=3) vs. non-mutant MDS samples (n=25). Amino acid R882, located in the methyltransferase domain of DNMT3A, was the most common mutation site, accounting for 4/12 mutations. The clinical characteristics of the 12 patients with DNMT3A mutations were similar to those of the 138 patients without mutations. Specifically, DNMT3A mutations were present in all MDS FAB subtypes (excluding CMML which was not tested) and in patients with IPSS scores ranging from 0–3. Mutations were not associated with a specific karyotype. In addition, there was no correlation between mutation detection and the myeloblast count of the banked bone marrow specimen, suggesting that mutations were not missed due to the cellular heterogeneity in the samples. We compared the overall (OS) and event-free survival (EFS) of the 12 patients with DNMT3A mutations vs. 138 patients without a mutation and observed a significantly worse OS in patients with mutations (p=0.02), with a median survival of 433 and 945 days, respectively. There was a trend towards worse EFS for patients with mutations (p=0.05). A multivariate analysis for outcomes could not be performed due to the small sample size of patients with mutations, indicating that a larger cohort from a clinical trial will be needed to properly address the affect of DNMT3A mutations on outcomes. The small sample size also precluded us from addressing whether the response to the hypomethylating agents 5-azacytidine or decitabine correlated with the mutation status of DNMT3A. If validated in larger cohort studies, we propose that DNMT3A mutation status could help risk stratify de novo MDS patients for more aggressive treatment early in their disease course. Disclosures: Westervelt: Novartis: Honoraria; Celgene: Honoraria, Speakers Bureau. DiPersio:Genzyme: Honoraria.


2016 ◽  
Vol 143 ◽  
pp. 127-142 ◽  
Author(s):  
Kai Dong ◽  
Herbert Pang ◽  
Tiejun Tong ◽  
Marc G. Genton

2012 ◽  
Vol 2012 ◽  
pp. 1-18
Author(s):  
Jiajuan Liang

High-dimensional data with a small sample size, such as microarray data and image data, are commonly encountered in some practical problems for which many variables have to be measured but it is too costly or time consuming to repeat the measurements for many times. Analysis of this kind of data poses a great challenge for statisticians. In this paper, we develop a new graphical method for testing spherical symmetry that is especially suitable for high-dimensional data with small sample size. The new graphical method associated with the local acceptance regions can provide a quick visual perception on the assumption of spherical symmetry. The performance of the new graphical method is demonstrated by a Monte Carlo study and illustrated by a real data set.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 146
Author(s):  
Xuan Xie ◽  
Hui Feng ◽  
Bo Hu

Bandwidth is the crucial knowledge to sampling, reconstruction or estimation of the graph signal (GS). However, it is typically unknown in practice. In this paper, we focus on detecting the bandwidth of bandlimited GS with a small sample size, where the number of spectral components of GS to be tested may greatly exceed the sample size. To control the significance of the result, the detection procedure is implemented by multi-stage testing. In each stage, a Bayesian score test, which introduces a prior to the spectral components, is adopted to face the high dimensional challenge. By setting different priors in each stage, we make the test more powerful against alternatives that have similar bandwidth to the null hypothesis. We prove that the Bayesian score test is locally most powerful in expectation against the alternatives following the given prior. Finally, numerical analysis shows that our method has a good performance in bandwidth detection and is robust to the noise.


Sign in / Sign up

Export Citation Format

Share Document