scholarly journals Gene Set Analysis for time-to-event outcome with the Generalized Berk–Jones statistic

2021 ◽  
Author(s):  
Laura Villain ◽  
Thomas Ferté ◽  
Rodolphe Thiébaut ◽  
Boris P. Hejblum

SummaryGene Set analysis allows to evaluate the impact of groups of genes on an outcome of interest, such as the occurrence of a disease. Through the definition of the gene sets, gene set analysis takes into account biological knowledge and makes it easier to interpret the results, while improving the statistical power compared to a gene-wise analysis. In the time-to-event context, few methods exist, but most of them do not take into account the correlation that occurs inside a gene set, which can be strong. As the Generalized Berk-Jones statistics showed great consistency and includes the correlation inside the test statistic, we adapted this method to the time-to-event context by using a Cox model. We compared our approach to other methods based on the Cox model, and showed that the Generalize Berk-Jones statistic offers great adaptability, meaning that it can be used in all kinds of data structures. We applied the different methods to two different contexts: Gliomas and Breast cancer. In terms of statistical power, we did offer similar results to the other Cox model methods, but with greater accuracy. In the breast cancer framework, we showed better statistical power than methods based on Kernel Machine score.

Author(s):  
Birgit Debrabant ◽  
Mette Soerensen

AbstractWe discuss the use of modified Kolmogorov-Smirnov (KS) statistics in the context of gene set analysis and review corresponding null and alternative hypotheses. Especially, we show that, when enhancing the impact of highly significant genes in the calculation of the test statistic, the corresponding test can be considered to infer the classical self-contained null hypothesis. We use simulations to estimate the power for different kinds of alternatives, and to assess the impact of the weight parameter of the modified KS statistic on the power. Finally, we show the analogy between the weight parameter and the genesis and distribution of the gene-level statistics, and illustrate the effects of differential weighting in a real-life example.


2012 ◽  
Vol 98 (4) ◽  
pp. 428-433 ◽  
Author(s):  
Mahmood Reza Gohari ◽  
Reza Khodabakhshi ◽  
Javad Shahidi ◽  
Zeinab Moghadami Fard ◽  
Hossein Foadzi ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Michal Marczyk ◽  
Agnieszka Macioszek ◽  
Joanna Tobiasz ◽  
Joanna Polanska ◽  
Joanna Zyla

A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar’s test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.


Cancers ◽  
2021 ◽  
Vol 13 (24) ◽  
pp. 6375
Author(s):  
Hitomi Mori ◽  
Kohei Saeki ◽  
Gregory Chang ◽  
Jinhui Wang ◽  
Xiwei Wu ◽  
...  

A 100% ER positivity is not required for an endocrine therapy response. Furthermore, while estrogen typically promotes the progression of hormone-dependent breast cancer via the activation of estrogen receptor (ER)-α, estrogen-induced tumor suppression in ER+ breast cancer has been clinically observed. With the success in establishing estrogen-stimulated (SC31) and estrogen-suppressed (GS3) patient-derived xenograft (PDX) models, single-cell RNA sequencing analysis was performed to determine the impact of estrogen on ESR1+ and ESR1– tumor cells. We found that 17β-estradiol (E2)-induced suppression of GS3 transpired through wild-type and unamplified ERα. E2 upregulated the expression of estrogen-dependent genes in both SC31 and GS3; however, E2 induced cell cycle advance in SC31, while it resulted in cell cycle arrest in GS3. Importantly, these gene expression changes occurred in both ESR1+ and ESR1– cells within the same breast tumors, demonstrating for the first time a differential effect of estrogen on ESR1– cells. E2 also upregulated a tumor-suppressor gene, IL-24, in GS3. The apoptosis gene set was upregulated and the G2M checkpoint gene set was downregulated in most IL-24+ cells after E2 treatment. In summary, estrogen affected pathologically defined ER+ tumors differently, influencing both ESR1+ and ESR1– cells. Our results also suggest IL-24 to be a potential marker of estrogen-suppressed tumors.


2011 ◽  
Vol 4 (8) ◽  
pp. 497-501
Author(s):  
Leo Alexander T Leo Alexander T ◽  
◽  
Pari Dayal L Pari Dayal L ◽  
Ponnuraja C Ponnuraja C ◽  
Venkatesan P Venkatesan P

Author(s):  
Lilit Nersisyan ◽  
Henry Löffler-Wirth ◽  
Arsen Arakelyan ◽  
Hans Binder

Genome-wide ‘omics'-assays provide a comprehensive view on the molecular landscapes of healthy and diseased cells. Bioinformatics traditionally pursues a ‘gene-centered' view by extracting lists of genes differentially expressed or methylated between healthy and diseased states. Biological knowledge mining is then performed by applying gene set techniques using libraries of functional gene sets obtained from independent studies. This analysis strategy neglects two facts: (i) that different disease states can be characterized by a series of functional modules of co-regulated genes and (ii) that the topology of the underlying regulatory networks can induce complex expression patterns that require analysis methods beyond traditional genes set techniques. The authors here provide a knowledge discovery method that overcomes these shortcomings. It combines machine learning using self-organizing maps with pathway flow analysis. It extracts and visualizes regulatory modes from molecular omics data, maps them onto selected pathways and estimates the impact of pathway-activity changes. The authors illustrate the performance of the gene set and pathway signal flow methods using expression data of oncogenic pathway activation experiments and of patient data on glioma, B-cell lymphoma and colorectal cancer.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Georgios Aivaliotis ◽  
Jan Palczewski ◽  
Rebecca Atkinson ◽  
Janet E. Cade ◽  
Michelle A. Morris

AbstractSurvival analysis with cohort study data has been traditionally performed using Cox proportional hazards models. Random survival forests (RSFs), a machine learning method, now present an alternative method. Using the UK Women’s Cohort Study (n = 34,493) we evaluate two methods: a Cox model and an RSF, to investigate the association between Body Mass Index and time to breast cancer incidence. Robustness of the models were assessed by cross validation and bootstraping. Histograms of bootstrap coefficients are reported. C-Indices and Integrated Brier Scores are reported for all models. In post-menopausal women, the Cox model Hazard Ratios (HR) for Overweight (OW) and Obese (O) were 1.25 (1.04, 1.51) and 1.28 (0.98, 1.68) respectively and the RSF Odds Ratios (OR) with partial dependence on menopause for OW and O were 1.34 (1.31, 1.70) and 1.45 (1.42, 1.48). HR are non-significant results. Only the RSF appears confident about the effect of weight status on time to event. Bootstrapping demonstrated Cox model coefficients can vary significantly, weakening interpretation potential. An RSF was used to produce partial dependence plots (PDPs) showing OW and O weight status increase the probability of breast cancer incidence in post-menopausal women. All models have relatively low C-Index and high Integrated Brier Score. The RSF overfits the data. In our study, RSF can identify complex non-proportional hazard type patterns in the data, and allow more complicated relationships to be investigated using PDPs, but it overfits limiting extrapolation of results to new instances. Moreover, it is less easily interpreted than Cox models. The value of survival analysis remains paramount and therefore machine learning techniques like RSF should be considered as another method for analysis.


2020 ◽  
Vol 40 (7) ◽  
Author(s):  
Pashupati P. Mishra ◽  
Ismo Hänninen ◽  
Emma Raitoharju ◽  
Saara Marttila ◽  
Binisha H. Mishra ◽  
...  

Abstract Smoking as a major risk factor for morbidity affects numerous regulatory systems of the human body including DNA methylation. Most of the previous studies with genome-wide methylation data are based on conventional association analysis and earliest threshold-based gene set analysis that lacks sensitivity to be able to reveal all the relevant effects of smoking. The aim of the present study was to investigate the impact of active smoking on DNA methylation at three biological levels: 5′-C-phosphate-G-3′ (CpG) sites, genes and functionally related genes (gene sets). Gene set analysis was done with mGSZ, a modern threshold-free method previously developed by us that utilizes all the genes in the experiment and their differential methylation scores. Application of such method in DNA methylation study is novel. Epigenome-wide methylation levels were profiled from Young Finns Study (YFS) participants’ whole blood from 2011 follow-up using Illumina Infinium HumanMethylation450 BeadChips. We identified three novel smoking related CpG sites and replicated 57 of the previously identified ones. We found that smoking is associated with hypomethylation in shore (genomic regions 0–2 kilobases from CpG island). We identified smoking related methylation changes in 13 gene sets with false discovery rate (FDR) ≤ 0.05, among which is olfactory receptor activity, the flagship novel finding of the present study. Overall, we extended the current knowledge by identifying: (i) three novel smoking related CpG sites, (ii) similar effects as aging on average methylation in shore, and (iii) a novel finding that olfactory receptor activity pathway responds to tobacco smoke and toxin exposure through epigenetic mechanisms.


2012 ◽  
Vol 30 (15_suppl) ◽  
pp. 1078-1078
Author(s):  
Lukas Schwentner ◽  
Regine Wolters ◽  
Igor Novopashenny ◽  
Manfred Wischnewsky ◽  
Rolf Kreienberg ◽  
...  

1078 Background: Beside unifocal-unilateral (UU) breast cancer (BC) there are several subtypes including multifocal, multicentric and bilateral BC. This study tries to answer the following questions:(1) Does localization (multifocal/multicentric/bilateral) influence outcome concerning BC mortality? (2) Is there an impact of guideline-adherent adjuvant treatment in these BC subtypes? Methods: This German multi-center retrospective cohort study called BRENDA included 5277 patients obtained from 1992 until 2005. The definition of guideline adherence was based on the German national S3 breast cancer guideline (2004). Results: 4085 (77.4%) were UU, 698 (13.2%) multifocal, 282 (5.3%) multicentric and 212 (4.0%) bilateral BC. RFS in multifocal [p=0.003; HR=1.35 (95% CI: 1.11-1.65)], multicentric [p<0.001; HR=1.76 (95% CI: 1.31-2.34)] and bilateral [p<0.001; HR=2.28 (95% CI: 1.76-2.97)] BC was significantly lower compared to unilateral-unifocal BC. Concerning OAS we found only a borderline difference between UU and unilateral-multifocal [p=0.057; HR=1.22 (95% CI: 0.99-1.48)], but a significant difference between multicentric [p= 0.018; HR=1.42 (95% CI: 1.06-1.90)] resp. bilateral [p<0.001; HR=2.87 (95% CI: 2.21-3.74)] and UU-BC. There was a significant impact by guideline adherent adjuvant therapy [UU: p<0.001, HR=2.76,95%C.I.:2.25-3.38], [unilateral-multifocal: p=0.001, HR=2.04,95%C.I.:1.33-3.14], [unilateral-multicentric: p=0.020, HR=2.13,95%C.I.:1.13-4.01] and [bilateral: p=0.042, HR=2.10,95%C.I.:1.03-4.31]. After stratifying for 100% guideline adherent treatment and adjusting for age, tumor size, nodal status and grading there was no significant difference in RFS/OAS in patients with multifocal [p=0.282/p=0.610], multicentric [p=0.829/p=0.609] or bilateral BC [p=0.457/p=0.773] compared to patients with UU-BC. Conclusions: Patients with multicentric and bilateral BC have primarily a worse prognosis in terms of RFS and OAS. However if guideline adherent adjuvant treatment was applied it was no more possible to demonstrate significant differences in survival.


Sign in / Sign up

Export Citation Format

Share Document