Unveiling the links between peptide identification and differential analysis FDR controls by means of a practical introduction to knockoff filters

Mapping Intimacies ◽

10.1101/2021.08.20.454134 ◽

2021 ◽

Author(s):

Lucas Etourneau ◽

Nelle Varoquaux ◽

Thomas Burger

Keyword(s):

Peptide Identification ◽

Differential Analysis ◽

P Values ◽

Multiple Test Correction ◽

Multiple Test ◽

Proteomic Data ◽

Alternative Method

In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target-decoy competition procedure, classically used in proteomics for FDR control at peptide identification. To provide practitioners with a unified understanding of FDR control in proteomics, we apply the knockoff procedure on real and simulated quantitative datasets. Leveraging these comparisons, we propose to adapt the knockoff procedure to better fit the specificities of quantitive proteomic data (mainly very few samples). Performances of knockoff procedure are compared with those of the classical Benjamini-Hochberg procedure, hereby shedding a new light on the strengths and weaknesses of target-decoy competition.

Download Full-text

T170. Effective Multiple Test Correction (MTC) for GWAS With Large Numbers of Correlated Genotypes and Phenotypes

Biological Psychiatry ◽

10.1016/j.biopsych.2019.03.493 ◽

2019 ◽

Vol 85 (10) ◽

pp. S195

Author(s):

Huma Asif ◽

Ney Alliey-Rodriguez ◽

Sarah Keedy ◽

Carol Tamminga ◽

Godfrey D. Pearlson ◽

...

Keyword(s):

Multiple Test Correction ◽

Multiple Test ◽

Large Numbers

Download Full-text

Deciphering sex-specific genetic architectures using local Bayesian regressions

10.1101/653386 ◽

2019 ◽

Author(s):

Scott A Funkhouser ◽

Ana I Vazquez ◽

Juan P Steibel ◽

Catherine W Ernst ◽

Gustavo de los Campos

Keyword(s):

Linkage Disequilibrium ◽

Genetic Effects ◽

Genome Wide Association ◽

Phenotypic Variance ◽

Multiple Test Correction ◽

Multiple Test ◽

Specific Effects ◽

Genome Wide ◽

Causal Variants ◽

Complex Human Traits

AbstractMany complex human traits exhibit differences between sexes. While numerous factors likely contribute to this phenomenon, growing evidence from genome-wide studies suggest a partial explanation: that males and females from the same population possess differing genetic architectures. Despite this, mapping gene-by-sex (G×S) interactions remains a challenge likely because the magnitude of such an interaction is typically and exceedingly small; traditional genome-wide association techniques may be underpowered to detect such events partly due to the burden of multiple test correction. Here, we developed a local Bayesian regression (LBR) method to estimate sex-specific SNP marker effects after fully accounting for local linkage-disequilibrium (LD) patterns. This enabled us to infer sex-specific effects and G×S interactions either at the single SNP level, or by aggregating the effects of multiple SNPs to make inferences at the level of small LD-based regions. Using simulations in which there was imperfect LD between SNPs and causal variants, we showed that aggregating sex-specific marker effects with LBR provides improved power and resolution to detect G×S interactions over traditional single-SNP-based tests. When using LBR to analyze traits from the UK Biobank, we detected a relatively large G×S interaction impacting bone-mineral density within ABO and replicated many previously detected large-magnitude G×S interactions impacting waist-to-hip ratio. We also discovered many new G×S interactions impacting such traits as height and BMI within regions of the genome where both male- and female-specific effects explain a small proportion of phenotypic variance (R2 < 1×10−4), but are enriched in known expression quantitative trait loci. By combining biobank-level data and techniques to estimate sex-specific SNP effects after accounting for local-LD patterns, we are providing evidence that numerous small-magnitude G×S interactions exist to influence sex differences in a variety of complex traits.Author SummaryMany complex human traits are known to be influenced by an impressive number of causal variants each with very small effects, posing great challenges for genome-wide association studies (GWAS). To add to this challenge, many causal variants may possess context-dependent effects such as effects that are dependent on biological sex. While GWAS are commonly performed using specific methods in which one single nucleotide polymorphism (SNP) at a time is tested for association with a trait, alternatively we utilize methods more commonly observed in the genomic prediction literature. Such methods are advantageous in that they are not burdened by multiple test correction in the same way as traditional GWAS techniques are, and can fully account for linkage-disequilibrium patterns to accurately estimate the true effects of SNP markers. Here we adapt such methods to estimate genetic effects within sexes and provide a powerful means to compare sex-specific genetic effects.

Download Full-text

A Multiple Test Correction for Streams and Cascades of Statistical Hypothesis Tests

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16 ◽

10.1145/2939672.2939775 ◽

2016 ◽

Cited By ~ 6

Author(s):

Geoffrey I. Webb ◽

François Petitjean

Keyword(s):

Statistical Hypothesis ◽

Hypothesis Tests ◽

Multiple Test Correction ◽

Multiple Test

Download Full-text

Erratum to: Global and multiple test procedures using ordered p-values – a review

Statistical Papers ◽

10.1007/s00362-013-0505-2 ◽

2013 ◽

Vol 55 (2) ◽

pp. 583-583 ◽

Cited By ~ 1

Author(s):

Gudrun Bernhard ◽

Markus Klein ◽

Gerhard Hommel

Keyword(s):

Test Procedures ◽

P Values ◽

Multiple Test

Download Full-text

PCprophet: a framework for protein complex prediction and differential analysis using proteomic data

Nature Methods ◽

10.1038/s41592-021-01107-5 ◽

2021 ◽

Author(s):

Andrea Fossati ◽

Chen Li ◽

Federico Uliana ◽

Fabian Wendt ◽

Fabian Frommelt ◽

...

Keyword(s):

Protein Complex ◽

Differential Analysis ◽

Proteomic Data ◽

Protein Complex Prediction

Download Full-text

A simulated metagenomic analysis of the gut microbiota of Anorexia Nervosa patients using PICRUSt

STEM Fellowship Journal ◽

10.17975/sfj-2017-012 ◽

2017 ◽

Vol 3 (2) ◽

pp. 1-3

Author(s):

Farhaan Kanji ◽

Himanshi Khurana ◽

Caitlin Sherry ◽

Evangeline Ng

Keyword(s):

Anorexia Nervosa ◽

Dna Sequences ◽

Normal Weight ◽

Data Transformations ◽

Fecal Samples ◽

Multiple Test Correction ◽

Multiple Test ◽

Propionate Metabolism ◽

Body Mass Index Increase ◽

History Of

Introduction Mack et al. (2016) studied the fecal bacteria and archaea of 55 European normal-weight participants (NW), 55 European patients with anorexia nervosa (ANT1), and 44 ANT1 patients following a body mass index increase (ANT2). Spreadsheets of identified microbes and their relative abundance per patient were uploaded to the EBI Metagenomics web server by Mack et al. We aimed to further study the functions of the identified microbes using the PICUSt algorithm (Langille, 2013) and see if these functions are consistent with published literature. Methods Spreadsheets were downloaded from EBI Metagenomics (Project# ERP012549) in JSON Biom format and uploaded to a Galaxy cloud server hosting PICRUSt. All data transformations can be viewed at http://huttenhower.sph.harvard.edu/galaxy/u/farhaansgroup/h/anorexi-astem-2017 . Transformed datasets were downloaded, appended with a .biom file extension, converted to the SPF format using STAMP v2.1.3 (Parks, 2014), and merged into a single file using Microsoft Excel for analysis with STAMP. Differences in propionate metabolism between ANT1, ANT2, and NW samples was chosen for further study. Results & Discussion The proportion of propionate metabolism genes was not significantly different between ANT1 and NW samples (p=0.08), but was different between ANT2 and NW samples (p=0.01) using a pair-wise Welsh’s t-test (0.95 CI) with a Storey FDR multiple test correction. In comparison, Mack et al, detected no differences in propionate concentration between AN and NW fecal samples using gas chromatography while Morito et al (2015) found lower concentrations of propionate in Japanese AN versus NW fecal samples using liquid chromatography. Our discrepancy with Mack et al could have arisen since PICRUSt cannot analyze the genes of eukaryotes, PICRUSt is limited by the depth and breadth of the gene annotations in the KEGG database, and our experimental setup cannot provide data on gene expression. Moreover, 18% of V4 16S rRNA DNA sequences could not be matched to any bacteria or archaea by EBI Metagenomics. In conclusion, while in silico experiments can be useful to predict microbial functions in a sample, in this case, our PICRUSt-based hypothesis that fecal samples from Mack et al would have different concentrations of propionate between AN and NW samples was not borne out by Mack et al’s chromatography experiments. Nonetheless, the conflicting findings between us, Mack et al, and Morito et al warrants further research on whether microbes mediate carbohydrate metabolism differently in patients with a history of anorexia nervosa versus controls.

Download Full-text

Gene expression profiles classifies the responsiveness of human osteosarcoma to doxorubicin, cisplatin and ifosfamide

Journal of Clinical Oncology ◽

10.1200/jco.2006.24.18_suppl.9534 ◽

2006 ◽

Vol 24 (18_suppl) ◽

pp. 9534-9534

Author(s):

S. Bruheim ◽

Y. Xi ◽

G. Nakajima ◽

J. Ju ◽

O. Fodstad

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Survival Rates ◽

Gene Expression Profiles ◽

Early Response ◽

Multiple Test Correction ◽

Multiple Test ◽

Human Osteosarcoma ◽

Response To Chemotherapy ◽

Cisplatin Sensitivity

9534 Background: Despite the increased survival rates of osteosarcoma patients attributed to adjuvant chemotherapy, at least one third of the patients still succumb to their disease. Furthermore, ultra-aggressive combination chemotherapy is associated with considerable acute and long term toxicity. This is of particular concern in patients who may be cured by a simpler and less toxic regimens or do not have micrometastatic disease. Hence, further improvements in the management of osteosarcoma seemingly depend on diagnostic and prognostic tools that may allow for a more risk adapted and individualized treatment. Methods: We have used GE Uniset Human 20K microarrays to obtain gene expression profiles from a panel of ten unique human osteosarcoma xenografts. For each of the three drugs doxorubicin, cisplatin or ifosfamide the xenografts were grouped according to their response to chemotherapy, resistant, weakly sensitive or sensitive. For each individual drug, a one-way ANOVA test with a Benjamini and Hochberg multiple test correction allowing a false discovery rate of 5% (doxorubicin, cisplatin) or 2% (ifosfamide) was used to identify genes with significantly differential expression. In addition a 2-fold cut off was applied to exclude smaller but yet significant differences. Results: For doxorubicin and cisplatin, respectively 59 and 120 genes met these criteria. The expression levels of 25 genes overlapped between these two groups. For ifosfamide, 148 genes were selected, for 5 of them the expression overlapped with cisplatin sensitivity related genes. In the lists, genes involved in mediating and regulating apoptosis were abundant, such as regulators of TGF signaling, ubiquitin mediated protein degradation and members of the immediate early response protein family. Several genes which products interact with components of the cytoskeleton were also identified. Conclusion: We have used a unique strategy to screen for potential chemosensitivity markers by utilizing xenografts as training sets. No significant financial relationships to disclose.

Download Full-text

CoMeBack: DNA methylation array data analysis for co-methylated regions

Bioinformatics ◽

10.1093/bioinformatics/btaa049 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2675-2683

Author(s):

Evan Gatev ◽

Nicole Gladish ◽

Sara Mostafavi ◽

Michael S Kobor

Keyword(s):

Dna Methylation ◽

Statistical Power ◽

Association Studies ◽

Supplementary Information ◽

Statistical Dependence ◽

Methylation Array ◽

Array Data ◽

Multiple Test Correction ◽

Multiple Test ◽

Cpg Sites

Abstract Motivation High-dimensional DNA methylation (DNAm) array coverage, while sparse in the context of the entire DNA methylome, still constitutes a very large number of CpG probes. The ensuing multiple-test corrections affect the statistical power to detect associations, likely contributing to prevalent limited reproducibility. Array probes measuring proximal CpG sites often have correlated levels of DNAm that may not only be biologically meaningful but also imply statistical dependence and redundancy. New methods that account for such correlations between adjacent probes may enable improved specificity, discovery and interpretation of statistical associations in DNAm array data. Results We developed a method named Co-Methylation with genomic CpG Background (CoMeBack) that estimates DNA co-methylation, defined as proximal CpG probes with correlated DNAm across individuals. CoMeBack outputs co-methylated regions (CMRs), spanning sets of array probes constructed based on all genomic CpG sites, including those not measured on the array, and without any phenotypic variable inputs. This approach can reduce the multiple-test correction burden, while enhancing the discovery and specificity of statistical associations. We constructed and validated CMRs in whole blood, using publicly available Illumina Infinium 450 K array data from over 5000 individuals. These CMRs were enriched for enhancer chromatin states, and binding site motifs for several transcription factors involved in blood physiology. We illustrated how CMR-based epigenome-wide association studies can improve discovery and reduce false positives for associations with chronological age. Availability and implementation https://bitbucket.org/flopflip/comeback. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genetic Analysis Reveals a Significant Contribution of CES1 to Prostate Cancer Progression in Taiwanese Men

Cancers ◽

10.3390/cancers12051346 ◽

2020 ◽

Vol 12 (5) ◽

pp. 1346

Author(s):

Chien-Chih Ke ◽

Lih-Chyang Chen ◽

Chia-Cheng Yu ◽

Wei-Chung Cheng ◽

Chao-Yuan Huang ◽

...

Keyword(s):

Prostate Cancer ◽

Cancer Patients ◽

Cancer Progression ◽

Meta Analysis ◽

Cholesterol Homeostasis ◽

Nucleotide Polymorphisms ◽

Prostate Cancer Progression ◽

Multiple Test Correction ◽

Multiple Test ◽

Cancer Aggressiveness

The genes that influence prostate cancer progression remain largely unknown. Since the carboxylesterase gene family plays a crucial role in xenobiotic metabolism and lipid/cholesterol homeostasis, we hypothesize that genetic variants in carboxylesterase genes may influence clinical outcomes for prostate cancer patients. A total of 478 (36 genotyped and 442 imputed) single nucleotide polymorphisms (SNPs) in five genes of the carboxylesterase family were assessed in terms of their associations with biochemical recurrence (BCR)-free survival in 643 Taiwanese patients with prostate cancer who underwent radical prostatectomy. The strongest association signal was shown in CES1 (P = 9.64 × 10−4 for genotyped SNP rs8192935 and P = 8.96 × 10−5 for imputed SNP rs8192950). After multiple test correction and adjustment for clinical covariates, CES1 rs8192935 (P = 9.67 × 10−4) and rs8192950 (P = 9.34 × 10−5) remained significant. These SNPs were correlated with CES1 expression levels, which in turn were associated with prostate cancer aggressiveness. Furthermore, our meta-analysis, including eight studies, indicated that a high CES1 expression predicted better outcomes among prostate cancer patients (hazard ratio 0.82, 95% confidence interval 0.70–0.97, P = 0.02). In conclusion, our findings suggest that CES1 rs8192935 and rs8192950 are associated with BCR and that CES1 plays a tumor suppressive role in prostate cancer.

Download Full-text

Deciphering Sex-Specific Genetic Architectures Using Local Bayesian Regressions

Genetics ◽

10.1534/genetics.120.303120 ◽

2020 ◽

Vol 215 (1) ◽

pp. 231-241

Author(s):

Scott A. Funkhouser ◽

Ana I. Vazquez ◽

Juan P. Steibel ◽

Catherine W. Ernst ◽

Gustavo de los Campos

Keyword(s):

Bayesian Regression ◽

Specific Marker ◽

Phenotypic Variance ◽

Mineral Density ◽

Partial Explanation ◽

Multiple Test Correction ◽

Multiple Test ◽

Specific Effects ◽

Genome Wide ◽

The Uk

Many complex human traits exhibit differences between sexes. While numerous factors likely contribute to this phenomenon, growing evidence from genome-wide studies suggest a partial explanation: that males and females from the same population possess differing genetic architectures. Despite this, mapping gene-by-sex (G×S) interactions remains a challenge likely because the magnitude of such an interaction is typically and exceedingly small; traditional genome-wide association techniques may be underpowered to detect such events, due partly to the burden of multiple test correction. Here, we developed a local Bayesian regression (LBR) method to estimate sex-specific SNP marker effects after fully accounting for local linkage-disequilibrium (LD) patterns. This enabled us to infer sex-specific effects and G×S interactions either at the single SNP level, or by aggregating the effects of multiple SNPs to make inferences at the level of small LD-based regions. Using simulations in which there was imperfect LD between SNPs and causal variants, we showed that aggregating sex-specific marker effects with LBR provides improved power and resolution to detect G×S interactions over traditional single-SNP-based tests. When using LBR to analyze traits from the UK Biobank, we detected a relatively large G×S interaction impacting bone mineral density within ABO, and replicated many previously detected large-magnitude G×S interactions impacting waist-to-hip ratio. We also discovered many new G×S interactions impacting such traits as height and body mass index (BMI) within regions of the genome where both male- and female-specific effects explain a small proportion of phenotypic variance (R2 < 1 × 10−4), but are enriched in known expression quantitative trait loci.

Download Full-text