scholarly journals Comparative analysis of somatic variant calling on matched FF and FFPE WGS from a metastatic prostate sample

2019 ◽  
Author(s):  
Louise de Schaetzen van Brienen ◽  
Maarten Larmuseau ◽  
Kim Van der Eecken ◽  
Jan Fostier ◽  
Piet Ost ◽  
...  

Abstract Background. Research grade Fresh Frozen (FF) DNA material is not yet routinely collected in clinical practice. Many hospitals, however, do collect and store Formalin Fixed Paraffin Embedded (FFPE) tumor samples. Consequently, the sample size of whole genome cancer cohort studies could be increased tremendously by including FFPE samples, although the presence of artifacts might obfuscate the variant calling. To assess whether FFPE material can be used for cohort studies, we performed an in-depth comparison of somatic SNVs called on matching FF and FFPE Whole Genome Sequence (WGS) samples extracted from the same prostate metastatic tumor. Results. We first compared the calls between FF and FFPE, showing that on average 50% of the calls in FF are recovered in FFPE, with notable differences between variant callers. Remarkably, this overlap was better than the overlap between different variant callers on the same sample. Inspecting the Variant Allele Frequency (VAF), we observed that many of the calls common to FF and FFPE belonged to the same clonal subpopulation but were detected at a lower VAF in FFPE. We also demonstrated that these calls receive higher significance scores and are often identified by more than one variant caller. Based on this observation, we propose a simple heuristic to perform reliable variant calling in FFPE samples. Our heuristic identified 3684 common calls at a F1-score of 0.83. Conclusion. This study illustrates that when using the correct variant calling strategy, the overlap between the FF and FFPE sample in somatic SNVs increases to such an extent that a large fraction of the calls detected in the FFPE sample are contained in the FF sample and the number of variants unique to each sample remains restricted. These results suggest that somatic variants derived from WGS of FFPE material can be used in cohort studies.

2020 ◽  
Author(s):  
Louise de Schaetzen van Brienen ◽  
Maarten Larmuseau ◽  
Kim Van der Eecken ◽  
Frederic De Ryck ◽  
Pauline Robbe ◽  
...  

Abstract Background. Research grade Fresh Frozen (FF) DNA material is not yet routinely collected in clinical practice. Many hospitals, however, collect and store Formalin Fixed Paraffin Embedded (FFPE) tumor samples. Consequently, the sample size of whole genome cancer cohort studies could be increased tremendously by including FFPE samples, although the presence of artefacts might obfuscate the variant calling. To assess whether FFPE material can be used for cohort studies, we performed an in-depth comparison of somatic SNVs called on matching FF and FFPE Whole Genome Sequence (WGS) samples extracted from the same tumor. Results. We first compared the calls between an FF and an FFPE sample from a metastatic prostate tumor, showing that on average 50% of the calls in the FF are recovered in the FFPE sample, with notable differences between variant callers. Combining the variants of the different callers using a simple heuristic, increases both the precision and the sensitivity of the variant calling. Validating the heuristic on nine additional matched FF-FFPE samples, resulted in an average F1-score of 0.58 and an outperformance of any of the individual callers. In addition, we could show that part of the discrepancy between the FF and the FFPE samples can be attributed to intra-tumor heterogeneity (ITH). Conclusion. This study illustrates that when using the correct variant calling strategy, the majority of clonal SNVs can be recovered in an FFPE sample with high precision and sensitivity. These results suggest that somatic variants derived from WGS of FFPE material can be used in cohort studies.


2020 ◽  
Author(s):  
Louise de Schaetzen van Brienen ◽  
Maarten Larmuseau ◽  
Kim Van der Eecken ◽  
Frederic De Ryck ◽  
Pauline Robbe ◽  
...  

Abstract Background. Research grade Fresh Frozen (FF) DNA material is not yet routinely collected in clinical practice. Many hospitals, however, collect and store Formalin Fixed Paraffin Embedded (FFPE) tumor samples. Consequently, the sample size of whole genome cancer cohort studies could be increased tremendously by including FFPE samples, although the presence of artefacts might obfuscate the variant calling. To assess whether FFPE material can be used for cohort studies, we performed an in-depth comparison of somatic SNVs called on matching FF and FFPE Whole Genome Sequence (WGS) samples extracted from the same tumor. Results. We first compared the calls between an FF and an FFPE from a metastatic prostate tumor, showing that on average 50% of the calls in the FF are recovered in the FFPE sample, with notable differences between variant callers. Combining the variants of the different callers using a simple heuristic increases both the precision and the sensitivity of the variant calling. Validating the heuristic on nine additional matched FF-FFPE samples, resulted in an average F1-score of 0.58 and an outperformance of any of the individual callers. In addition, we could show that part of the discrepancy between the FF and the FFPE samples can be attributed to intra-tumor heterogeneity (ITH). Conclusion. This study illustrates that when using the correct variant calling strategy, the majority of clonal SNVs can be recovered in an FFPE sample with high precision and sensitivity. These results suggest that somatic variants derived from WGS of FFPE material can be used in cohort studies.


2019 ◽  
Author(s):  
Aditya Vijay Bhagwate ◽  
Yuanhang Liu ◽  
Stacey J. Winham ◽  
Samantha J. McDonough ◽  
Melody L. Stallings-Mann ◽  
...  

Abstract Background Archived formalin fixed paraffin embedded (FFPE) samples are valuable clinical resources to examine clinically relevant morphology features and also to study genetic changes. However, DNA quality and quantity of FFPE samples are often sub-optimal, and resulting NGS-based genetics variant detections are prone to false positives. Evaluations of wet-lab and bioinformatics approaches are needed to optimize variant detection from FFPE samples. Results As a pilot study, we designed within-subject triplicate samples of DNA derived from paired FFPE and fresh frozen breast tissues to highlight FFPE-specific artifacts. For FFPE samples, we tested two FFPE DNA extraction methods to determine impact of wet-lab procedures on variant calling: QIAGEN QIAamp DNA Mini Kit ("QA"), and QIAGEN GeneRead DNA FFPE Kit ("QGR"). We also used negative-control (NA12891) and positive control samples (Horizon Discovery Reference Standard FFPE). All DNA sample libraries were prepared for NGS according to the QIAseq Human Breast Cancer Targeted DNA Panel protocol and sequenced on the HiSeq 4000. Variant calling and filtering were performed using QIAGEN Gene Globe Data Portal. Detailed variant concordance comparisons and mutational signature analysis were performed to investigate effects of FFPE samples compared to paired fresh frozen samples, along with different library preparations. In this study, we found that five times or more variants were called with FFPE samples, compared to their paired fresh-frozen tissue samples even after applying molecular barcoding error-correction and default bioinformatics filtering recommended by the vendor. We also found that QGR as an optimized FFPE-DNA extraction approach leads to much fewer discordant variants between paired fresh frozen and FFPE samples. Approximately 92% of the uniquely called FFPE variants were of low allelic frequency range (<5%), and collectively shared a “C>T|G>A” mutational signature known to be representative of FFPE artifacts resulting fromcytosine deamination. Based on control samples and FFPE-frozen replicates, we derived an effective filtering strategy with associated empirical false-discovery estimates. Conclusions Through this study, we demonstrated feasibility of calling and filtering genetic variants from FFPE tissue samples using a combined strategy with molecular barcodes, optimized DNA extraction, and bioinformatics methods incorporating genomics context such as mutational signature and variant allelic frequency.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e13016-e13016
Author(s):  
Shannon Terrell Bailey ◽  
Belynda Hicks ◽  
Bin Zhu ◽  
Nan Hu ◽  
Phil R. Taylor ◽  
...  

e13016 Background: Whole-genome sequencing (WGS) of formalin-fixed, paraffin-embedded (FFPE) samples could enable novel insights from archival sample collections, yet robust FFPE WGS is challenged by fragmented DNA, uneven genomic coverage & sequencing artifacts attributed to FFPE fixation. We report our proprietary extraction & library preparation methodology (SeqPlus) with high quality, uniform WGS sequencing performance comparable to that from fresh-frozen samples. Methods: We analyzed 20 paired esophageal carcinoma (EC) samples i.e., primary tumors & matched germline samples to assess SeqPlus performance on 10-15-year-old FFPE tissues, measure variant concordance between WGS and a high-depth sequencing panel (269 genes, 400x coverage) & identify novel genomic features. Results: At a targeted 70x WGS tumor sequencing depth, 93% of the genome was covered by ³ 20 reads, 99% of bases had 10x coverage & average duplicate reads were 31%. We noted similar transition/transversion ratios & mutational spectra as from fresh-frozen EC specimens, suggesting that extraction & library preparation contributes to prior FFPE artifacts. Concordance of tumor-specific SNVs & indels derived from WGS & targeted panel was high at 86%. All 76 targeted panel-detected variants above the WGS limit of detection (mutant allele frequency [MAF] > 10%) were detected by WGS, 2 variants (2 tumors) were detected only by WGS, and 12 variants at MAF ≤ 6% (9 tumors) were only detected by the targeted panel. Tumor WGS yielded SNV, indels & CNV findings beyond variants detected by targeted sequencing. WGS enabled detection of 10.4 putative cancer variants per tumor compared to 12 variants per patient from frozen specimens and a median of 7 (up to 16) cancer-associated variants in genes outside the targeted panel. WGS copy number analysis revealed CCND1, EGFR, TP63, and SOX2amplification, CDKN2A/B deletion and additional unrecognized genomic aberrations. Conclusions: Our study reinforces the utility of high-quality, uniform WGS sequencing of archival FFPE cancer samples with SeqPlus and unlocks the potential for massive-scale retrospective genomic analysis of archived pathology samples with associated clinical & outcomes data.


Author(s):  
Shatha Alosaimi ◽  
Noëlle van Biljon ◽  
Denis Awany ◽  
Prisca K Thami ◽  
Joel Defo ◽  
...  

Abstract Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.


2020 ◽  
Author(s):  
Anuraj Nayarisseri ◽  
Sanjeev Kumar Singh

Abstract We announce the complete genome sequence of Bacillus tequilensis, a biosurfactant producing bacterium isolated from Chilika lake, Odisha, India(latitude and longitude: 19.8450 N 85.4788 E). The genome sequence is 4.47 MB consisting of 4,478,749 base pairs forming a circular chromosome with 528 scaffolds, 4492 protein-encoding genes(ORFs), 81 tRNA genes, and 114 ribosomal RNA transcription units. The total number of raw reads was 4209415 and processed reads were 4058238 with predicted genes of 4492. The whole-genome obtained from the present investigation was used for genome annotation, variant calling, variant annotation and comparative genome analysis with other existing Bacillus species. In this study we constructed a pathway which describe the biosurfactant metabolism of Bacillus tequilensis and identified the genes such as SrfAD, SrfAC, SrfAA which are involved in biosurfactant synthesis. The sequence of the same was deposited in Genbank database with accession MUG02427.1, MUG02428.1, MUG02429.1, MUG03515.1 respectively. The whole-genome sequence was submitted to Genbank with an accession RMVO00000000 and the raw reads can be obtained from SRA, NCBI repository using accession: SRX5023292.


2016 ◽  
Vol 48 (12) ◽  
pp. 922-927 ◽  
Author(s):  
Kari Branham ◽  
Hiroko Matsui ◽  
Pooja Biswas ◽  
Aditya A. Guru ◽  
Michael Hicks ◽  
...  

While more than 250 genes are known to cause inherited retinal degenerations (IRD), nearly 40–50% of families have the genetic basis for their disease unknown. In this study we sought to identify the underlying cause of IRD in a family by whole genome sequence (WGS) analysis. Clinical characterization including standard ophthalmic examination, fundus photography, visual field testing, electroretinography, and review of medical and family history was performed. WGS was performed on affected and unaffected family members using Illumina HiSeq X10. Sequence reads were aligned to hg19 using BWA-MEM and variant calling was performed with Genome Analysis Toolkit. The called variants were annotated with SnpEff v4.11, PolyPhen v2.2.2, and CADD v1.3. Copy number variations were called using Genome STRiP (svtoolkit 2.00.1611) and SpeedSeq software. Variants were filtered to detect rare potentially deleterious variants segregating with disease. Candidate variants were validated by dideoxy sequencing. Clinical evaluation revealed typical adolescent-onset recessive retinitis pigmentosa (arRP) in affected members. WGS identified about 4 million variants in each individual. Two rare and potentially deleterious compound heterozygous variants p.Arg281Cys and p.Arg487* were identified in the gene ATP/GTP binding protein like 5 ( AGBL5) as likely causal variants. No additional variants in IRD genes that segregated with disease were identified. Mutation analysis confirmed the segregation of these variants with the IRD in the pedigree. Homology models indicated destabilization of AGBL5 due to the p.Arg281Cys change. Our findings establish the involvement of mutations in AGBL5 in RP and validate the WGS variant filtering pipeline we designed.


2019 ◽  
Author(s):  
Aditya Vijay Bhagwate ◽  
Yuanhang Liu ◽  
Stacey J. Winham ◽  
Samantha J. McDonough ◽  
Melody L. Stallings-Mann ◽  
...  

Abstract Background Archived formalin fixed paraffin embedded (FFPE) samples are valuable clinical resources to examine clinically relevant morphology features and also to study genetic changes. However, DNA quality and quantity of FFPE samples are often sub-optimal, and resulting NGS-based genetics variant detections are prone to false positives. Evaluations of wet-lab and bioinformatics approaches are needed to optimize variant detection from FFPE samples. Results As a pilot study, we designed within-subject triplicate samples of DNA derived from paired FFPE and fresh frozen breast tissues to highlight FFPE-specific artifacts. For FFPE samples, we tested two FFPE DNA extraction methods to determine impact of wet-lab procedures on variant calling: QIAGEN QIAamp DNA Mini Kit ("QA"), and QIAGEN GeneRead DNA FFPE Kit ("QGR"). We also used negative-control (NA12891) and positive control samples (Horizon Discovery Reference Standard FFPE). All DNA sample libraries were prepared for NGS according to the QIAseq Human Breast Cancer Targeted DNA Panel protocol and sequenced on the HiSeq 4000. Variant calling and filtering were performed using QIAGEN Gene Globe Data Portal. Detailed variant concordance comparisons and mutational signature analysis were performed to investigate effects of FFPE samples compared to paired fresh frozen samples, along with different library preparations. In this study, we found that five times or more variants were called with FFPE samples, compared to their paired fresh-frozen tissue samples even after applying molecular barcoding error-correction and default bioinformatics filtering recommended by the vendor. We also found that QGR as an optimized FFPE-DNA extraction approach leads to much fewer discordant variants between paired fresh frozen and FFPE samples. Approximately 92% of the uniquely called FFPE variants were of low allelic frequency range (<5%), and collectively shared a “C>T|G>A” mutational signature known to be representative of FFPE artifacts resulting fromcytosine deamination. Based on control samples and FFPE-frozen replicates, we derived an effective filtering strategy with associated empirical false-discovery estimates. Conclusions Through this study, we demonstrated feasibility of calling and filtering genetic variants from FFPE tissue samples using a combined strategy with molecular barcodes, optimized DNA extraction, and bioinformatics methods incorporating genomics context such as mutational signature and variant allelic frequency.


2019 ◽  
Author(s):  
Aditya Vijay Bhagwate ◽  
Yuanhang Liu ◽  
Stacey J. Winham ◽  
Samantha J. McDonough ◽  
Melody L. Stallings-Mann ◽  
...  

Abstract Background: Archived formalin fixed paraffin embedded (FFPE) samples are valuable clinical resources to examine clinically relevant morphology features and also to study genetic changes. However, DNA quality and quantity of FFPE samples are sub-optimal, and resulting NGS-based genetics variant detections are prone to false positives. Evaluations of wet-lab and bioinformatics approaches are needed to optimize variant detection from FFPE samples. Results: As a pilot study, we designed within-subject triplicate samples of DNA derived from paired FFPE and fresh frozen breast tissues to highlight FFPE-specific artifacts. For FFPE samples, we tested two FFPE DNA extraction methods to determine impact of wet-lab procedures on variant calling: QIAGEN QIAamp DNA Mini Kit ("QA"), and QIAGEN GeneRead DNA FFPE Kit ("QGR"). We also used negative-control (NA12891) and positive control samples (Horizon Discovery Reference Standard FFPE). All DNA sample libraries were prepared for NGS according to the QIAseq Human Breast Cancer Targeted DNA Panel protocol and sequenced on the HiSeq 4000. Variant calling and filtering were performed using QIAGEN Gene Globe Data Portal. Detailed variant concordance comparisons and mutational signature analysis were performed to investigate effects of FFPE samples compared to paired fresh frozen samples, along with different DNA extraction methods. In this study, we found that five times or more variants were called with FFPE samples, compared to their paired fresh-frozen tissue samples even after applying molecular barcoding error-correction and default bioinformatics filtering recommended by the vendor. We also found that QGR as an optimized FFPE-DNA extraction approach leads to much fewer discordant variants between paired fresh frozen and FFPE samples. Approximately 92 % of the uniquely called FFPE variants were of low allelic frequency range (<5%), and collectively shared a “C>T|G>A” mutational signature known to be representative of FFPE artifacts resulting from cytosine deamination. Based on control samples and FFPE-frozen replicates, we derived an effective filtering strategy with associated empirical false-discovery estimates. Conclusions: Through this study, we demonstrated feasibility of calling and filtering genetic variants from FFPE tissue samples using a combined strategy with molecular barcodes, optimized DNA extraction, and bioinformatics methods incorporating genomics context such as mutational signature and variant allelic frequency.


Sign in / Sign up

Export Citation Format

Share Document