TPES: tumor purity estimation from SNVs

Alessio Locallo; Davide Prandi; Tarcisio Fedrizzi; Francesca Demichelis

doi:10.1093/bioinformatics/btz406

TPES: tumor purity estimation from SNVs

Bioinformatics ◽

10.1093/bioinformatics/btz406 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4433-4435 ◽

Cited By ~ 4

Author(s):

Alessio Locallo ◽

Davide Prandi ◽

Tarcisio Fedrizzi ◽

Francesca Demichelis

Keyword(s):

R Package ◽

Computational Method ◽

Supplementary Information ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Tumor Purity ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Fraction Distribution ◽

Tumor Genome

Abstract Motivation Tumor purity (TP) is the proportion of cancer cells in a tumor sample. TP impacts on the accurate assessment of molecular and genomics features as assayed with NGS approaches. State-of-the-art tools mainly rely on somatic copy-number alterations (SCNA) to quantify TP and therefore fail when a tumor genome is nearly euploid, i.e. ‘non-aberrant’ in terms of identifiable SCNAs. Results We introduce a computational method, tumor purity estimation from single-nucleotide variants (SNVs), which derives TP from the allelic fraction distribution of SNVs. On more than 7800 whole-exome sequencing data of TCGA tumor samples, it showed high concordance with a range of TP tools (Spearman’s correlation between 0.68 and 0.82; >9 SNVs) and rescued TP estimates of 1, 194 samples (15%) pan-cancer. Availability and implementation TPES is available as an R package on CRAN and at https://bitbucket.org/l0ka/tpes.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MAGOS: Discovering Subclones in Tumors Sequenced at Standard Depths

10.1101/790386 ◽

2019 ◽

Cited By ~ 1

Author(s):

Navid Ahmadinejad ◽

Shayna Troftgruben ◽

Carlo Maley ◽

Junwen Wang ◽

Li Liu

Keyword(s):

Copy Number Variants ◽

R Package ◽

Computational Method ◽

Intratumor Heterogeneity ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Real World Data ◽

Computation Efficiency ◽

Whole Exome ◽

Whole Exome Sequencing Data

ABSTRACTUnderstanding intratumor heterogeneity is critical to designing personalized treatments and improving clinical outcomes of cancers. Such investigations require accurate delineation of the subclonal composition of a tumor, which to date can only be reliably inferred from deep-sequencing data (>300x depth). To enable accurate subclonal discovery in tumors sequenced at standard depths (30-50x), we develop a novel computational method that incorporates an adaptive error model into statistical decomposition of mixed populations, which corrects the mean-variance dependency of sequencing data at the subclonal level. Tested on extensive computer simulations and real-world data, this new method, named model-based adaptive grouping of subclones (MAGOS), consistently outperforms existing methods on minimum sequencing depth, decomposition accuracy and computation efficiency. MAGOS supports subclone analysis using single nucleotide variants and copy number variants from one or more samples of an individual tumor. Applications of MAGOS to whole-exome sequencing data of 331 liver cancer samples discovered a significant association between subclonal diversity and patient overall survival. MAGOS is freely available as an R package at github (https://github.com/liliulab/magos).

Download Full-text

Global copy number profiling of cancer genomes

Bioinformatics ◽

10.1093/bioinformatics/btv676 ◽

2015 ◽

Vol 32 (6) ◽

pp. 926-928 ◽

Cited By ~ 4

Author(s):

Xuefeng Wang ◽

Mengjie Chen ◽

Xiaoqing Yu ◽

Natapol Pornputtapong ◽

Hao Chen ◽

...

Keyword(s):

Copy Number ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Tumor Purity ◽

Whole Exome ◽

Cancer Genomes ◽

Whole Exome Sequencing Data ◽

Allele Specific

Abstract Summary: In this article, we introduce a robust and efficient strategy for deriving global and allele-specific copy number alternations (CNA) from cancer whole exome sequencing data based on Log R ratios and B-allele frequencies. Applying the approach to the analysis of over 200 skin cancer samples, we demonstrate its utility for discovering distinct CNA events and for deriving ancillary information such as tumor purity. Availability and implementation: https://github.com/xfwang/CLOSE Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

ABEMUS: platform-specific and data-informed detection of somatic SNVs in cfDNA

Bioinformatics ◽

10.1093/bioinformatics/btaa016 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2665-2674

Author(s):

Nicola Casiraghi ◽

Francesco Orlando ◽

Yari Ciani ◽

Jenny Xiang ◽

Andrea Sboner ◽

...

Keyword(s):

Cancer Patients ◽

R Package ◽

Circulating Tumor Dna ◽

Supplementary Information ◽

Sequencing Error ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Liquid Biopsies ◽

Non Invasive ◽

Cross Platform

Abstract Motivation The use of liquid biopsies for cancer patients enables the non-invasive tracking of treatment response and tumor dynamics through single or serial blood drawn tests. Next-generation sequencing assays allow for the simultaneous interrogation of extended sets of somatic single-nucleotide variants (SNVs) in circulating cell-free DNA (cfDNA), a mixture of DNA molecules originating both from normal and tumor tissue cells. However, low circulating tumor DNA (ctDNA) fractions together with sequencing background noise and potential tumor heterogeneity challenge the ability to confidently call SNVs. Results We present a computational methodology, called Adaptive Base Error Model in Ultra-deep Sequencing data (ABEMUS), which combines platform-specific genetic knowledge and empirical signal to readily detect and quantify somatic SNVs in cfDNA. We tested the capability of our method to analyze data generated using different platforms with distinct sequencing error properties and we compared ABEMUS performances with other popular SNV callers on both synthetic and real cancer patients sequencing data. Results show that ABEMUS performs better in most of the tested conditions proving its reliability in calling low variant allele frequencies somatic SNVs in low ctDNA levels plasma samples. Availability and implementation ABEMUS is cross-platform and can be installed as R package. The source code is maintained on Github at http://github.com/cibiobcg/abemus, and it is also available at CRAN official R repository. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Integrated Analysis of Germline and Tumor DNA Identifies New Candidate Genes Involved in Familial Colorectal Cancer

Cancers ◽

10.3390/cancers11030362 ◽

2019 ◽

Vol 11 (3) ◽

pp. 362 ◽

Cited By ~ 7

Author(s):

Marcos Díaz-Gay ◽

Sebastià Franch-Expósito ◽

Coral Arnau-Collell ◽

Solip Park ◽

Fran Supek ◽

...

Keyword(s):

Colorectal Cancer ◽

Candidate Genes ◽

Copy Number Variants ◽

Integrated Analysis ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Functional Studies ◽

Mutational Profiling ◽

Whole Exome ◽

Whole Exome Sequencing Data

Colorectal cancer (CRC) shows aggregation in some families but no alterations in the known hereditary CRC genes. We aimed to identify new candidate genes which are potentially involved in germline predisposition to familial CRC. An integrated analysis of germline and tumor whole-exome sequencing data was performed in 18 unrelated CRC families. Deleterious single nucleotide variants (SNV), short insertions and deletions (indels), copy number variants (CNVs) and loss of heterozygosity (LOH) were assessed as candidates for first germline or second somatic hits. Candidate tumor suppressor genes were selected when alterations were detected in both germline and somatic DNA, fulfilling Knudson’s two-hit hypothesis. Somatic mutational profiling and signature analysis were also performed. A series of germline-somatic variant pairs were detected. In all cases, the first hit was presented as a rare SNV/indel, whereas the second hit was either a different SNV (3 genes) or LOH affecting the same gene (141 genes). BRCA2, BLM, ERCC2, RECQL, REV3L and RIF1 were among the most promising candidate genes for germline CRC predisposition. The identification of new candidate genes involved in familial CRC could be achieved by our integrated analysis. Further functional studies and replication in additional cohorts are required to confirm the selected candidates.

Download Full-text

sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies

Bioinformatics ◽

10.1093/bioinformatics/btaa689 ◽

2020 ◽

Author(s):

Thiago L A Miller ◽

Fernanda Orpinelli Rego ◽

José Leonel L Buzzo ◽

Pedro A F Galante

Keyword(s):

Supplementary Information ◽

Whole Genome ◽

Sequencing Data ◽

Genomic Context ◽

Gene Copies ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Processed Pseudogenes ◽

Bona Fide ◽

Insertion Sites

Abstract Motivation Retrocopies or processed pseudogenes are gene copies resulting from mRNA retrotransposition. These gene duplicates can be fixed, somatically inserted or polymorphic in the genome. However, knowledge regarding unfixed retrocopies (retroCNVs) is still limited, and the development of computational tools for effectively identifying and genotyping them is an urgent need. Results Here, we present sideRETRO, a pipeline dedicated not only to detecting retroCNVs in whole-genome or whole-exome sequencing data but also to revealing their insertion sites, zygosity and genomic context and classifying them as somatic or polymorphic events. We show that sideRETRO can identify novel retroCNVs and genotype them, in addition to finding polymorphic retroCNVs in whole-genome and whole-exome data. Therefore, sideRETRO fills a gap in the literature and presents an efficient and straightforward algorithm to accelerate the study of bona fide retroCNVs. Availability and implementation sideRETRO is available at https://github.com/galantelab/sideRETRO Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detection of Merkel cell polyomavirus using whole exome sequencing data

10.1101/2020.04.27.063214 ◽

2020 ◽

Cited By ~ 2

Author(s):

Sandra Garcia-Mulero ◽

Ferran Moratalla-Navarro ◽

Soraya Curiel-Olmo ◽

Victor Moreno ◽

José Pedro Vaqué ◽

...

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Insertion Site ◽

Merkel Cell Polyomavirus ◽

Merkel Cell ◽

Sequencing Data ◽

Phenotypic Differences ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Tumor Genome

AbstractMerkel cell carcinoma (MCC) is a highly malignant neuroendocrine tumor of the skin in which Merkel cell polyomavirus (MCV) DNA virus insertion can be detected in 75–89% of cases. Etiologic and phenotypic differences exist between MCC tumors with and without the inserted virus, thus it is important to distinguish between MCV+ MCC and MCV-MCC cases. Currently, MCV insertions in MCC genomes are detected using laboratory techniques. Here we report a freely available bioinformatics methodology to identify MCV+ MCC tumors using whole exome sequencing (WES) data. WES data could be also used to infer the virus insertion site into the tumor genome. Our method has been validated in a set of MCC samples previously characterized in the laboratory as MCV+ or MCV-, achieving 100% sensitivity and 62,5% specificity. Thus, with enough depth of sequencing, it is possible to use WES to the presence of MCV insertions in cancer samples.

Download Full-text

Biallelic novel mutations of the COL27A1 gene in a patient with Steel syndrome

Human Genome Variation ◽

10.1038/s41439-021-00149-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Jong Seop Kim ◽

Hyoungseok Jeon ◽

Hyeran Lee ◽

Jung Min Ko ◽

Yonghwan Kim ◽

...

Keyword(s):

Hip Dysplasia ◽

Large Deletion ◽

Compound Heterozygous ◽

Radial Head Dislocation ◽

Sequencing Data ◽

Novel Mutations ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Carpal Coalition

AbstractAn 11-year-old Korean boy presented with short stature, hip dysplasia, radial head dislocation, carpal coalition, genu valgum, and fixed patellar dislocation and was clinically diagnosed with Steel syndrome. Scrutinizing the trio whole-exome sequencing data revealed novel compound heterozygous mutations of COL27A1 (c.[4229_4233dup]; [3718_5436del], p.[Gly1412Argfs*157];[Gly1240_Lys1812del]) in the proband, which were inherited from heterozygous parents. The maternal mutation was a large deletion encompassing exons 38–60, which was challenging to detect.

Download Full-text

ETumorMetastasis: A Network-based Algorithm Predicts Clinical Outcomes Using Whole-exome Sequencing Data of Cancer Patients

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2020.06.009 ◽

2021 ◽

Cited By ~ 1

Author(s):

Jean-Sébastien Milanese ◽

Chabane Tibiche ◽

Naif Zaman ◽

Jinfeng Zou ◽

Pengyong Han ◽

...

Keyword(s):

Exome Sequencing ◽

Cancer Patients ◽

Clinical Outcomes ◽

Whole Exome Sequencing ◽

Sequencing Data ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

Download Full-text

Long runs of homozygosity are associated with Alzheimer’s disease

Translational Psychiatry ◽

10.1038/s41398-020-01145-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sonia Moreno-Grau ◽

◽

Maria Victoria Fernández ◽

Itziar de Rojas ◽

Pablo Garcia-González ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

European Ancestry ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Outbred Population ◽

Whole Exome ◽

Whole Exome Sequencing Data ◽

Outbred Populations ◽

Recessive Effects

AbstractLong runs of homozygosity (ROH) are contiguous stretches of homozygous genotypes, which are a footprint of inbreeding and recessive inheritance. The presence of recessive loci is suggested for Alzheimer’s disease (AD); however, their search has been poorly assessed to date. To investigate homozygosity in AD, here we performed a fine-scale ROH analysis using 10 independent cohorts of European ancestry (11,919 AD cases and 9181 controls.) We detected an increase of homozygosity in AD cases compared to controls [βAVROH (CI 95%) = 0.070 (0.037–0.104); P = 3.91 × 10−5; βFROH (CI95%) = 0.043 (0.009–0.076); P = 0.013]. ROHs increasing the risk of AD (OR > 1) were significantly overrepresented compared to ROHs increasing protection (p < 2.20 × 10−16). A significant ROH association with AD risk was detected upstream the HS3ST1 locus (chr4:11,189,482‒11,305,456), (β (CI 95%) = 1.09 (0.48 ‒ 1.48), p value = 9.03 × 10−4), previously related to AD. Next, to search for recessive candidate variants in ROHs, we constructed a homozygosity map of inbred AD cases extracted from an outbred population and explored ROH regions in whole-exome sequencing data (N = 1449). We detected a candidate marker, rs117458494, mapped in the SPON1 locus, which has been previously associated with amyloid metabolism. Here, we provide a research framework to look for recessive variants in AD using outbred populations. Our results showed that AD cases have enriched homozygosity, suggesting that recessive effects may explain a proportion of AD heritability.

Download Full-text

Detection of differentially methylated CpG sites between tumor samples with uneven tumor purities

Bioinformatics ◽

10.1093/bioinformatics/btz885 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2017-2024

Author(s):

Weiwei Zhang ◽

Ziyi Li ◽

Nana Wei ◽

Hua-Jun Wu ◽

Xiaoqi Zheng

Keyword(s):

Real Data ◽

R Package ◽

Differential Methylation ◽

Least Square ◽

Epigenetic Mechanism ◽

Supplementary Information ◽

Cpg Sites ◽

Tumor Purity ◽

Different Sources ◽

Normal Controls

Abstract Motivation Inference of differentially methylated (DM) CpG sites between two groups of tumor samples with different geno- or pheno-types is a critical step to uncover the epigenetic mechanism of tumorigenesis, and identify biomarkers for cancer subtyping. However, as a major source of confounding factor, uneven distributions of tumor purity between two groups of tumor samples will lead to biased discovery of DM sites if not properly accounted for. Results We here propose InfiniumDM, a generalized least square model to adjust tumor purity effect for differential methylation analysis. Our method is applicable to a variety of experimental designs including with or without normal controls, different sources of normal tissue contaminations. We compared our method with conventional methods including minfi, limma and limma corrected by tumor purity using simulated datasets. Our method shows significantly better performance at different levels of differential methylation thresholds, sample sizes, mean purity deviations and so on. We also applied the proposed method to breast cancer samples from TCGA database to further evaluate its performance. Overall, both simulation and real data analyses demonstrate favorable performance over existing methods serving similar purpose. Availability and implementation InfiniumDM is a part of R package InfiniumPurify, which is freely available from GitHub (https://github.com/Xiaoqizheng/InfiniumPurify). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text