scholarly journals A fully automated approach for quality control of cancer mutations in the era of high-resolution whole genome sequencing

2021 ◽  
Author(s):  
Jacob Househam ◽  
William CH Cross ◽  
Giulio Caravagna

AbstractCancer is a global health issue that places enormous demands on healthcare systems. Basic research, the development of targeted treatments, and the utility of DNA sequencing in clinical settings, have been significantly improved with the introduction of whole genome sequencing. However the broad applications of this technology come with complications. To date there has been very little standardisation in how data quality is assessed, leading to inconsistencies in analyses and disparate conclusions. Manual checking and complex consensus calling strategies often do not scale to large sample numbers, which leads to procedural bottlenecks. To address this issue, we present a quality control method that integrates point mutations, copy numbers, and other metrics into a single quantitative score. We demonstrate its power on 1,065 whole-genomes from a large-scale pan-cancer cohort, and on multi-region data of two colorectal cancer patients. We highlight how our approach significantly improves the generation of cancer mutation data, providing visualisations for cross-referencing with other analyses. Our approach is fully automated, designed to work downstream of any bioinformatic pipeline, and can automatise tool parameterization paving the way for fast computational assessment of data quality in the era of whole genome sequencing.

2018 ◽  
Author(s):  
Adam C. Naj ◽  
Honghuang Lin ◽  
Badri N. Vardarajan ◽  
Simon White ◽  
Daniel Lancour ◽  
...  

AbstractThe Alzheimer’s Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed “consensus calling,” to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.AbbreviationsAD, Alzheimer’s disease; QC, Quality Control; LSSAC, Large-Scale Sequencing and Analysis Center; Broad, Broad Institute Genomics Service; Baylor, Baylor College of Medicine Human Genome Sequencing Center; WashU, Washington University-St. Louis McDonnell Genome Institute; WGS, whole genome sequencing; WES, whole exome sequencing; indel, insertion-deletion variants; VCF, variant control format; MI, Mendelian inconsistency; MC, Mendelian consistency; GWAS, genome-wide association study; VR, referent allele read depth; DP, overall read depth; MS, mapping score; GQ, genotype quality score; Ti/Tv, Transition/Transversion; CS, concordance code


2016 ◽  
Vol 94 (suppl_5) ◽  
pp. 146-146
Author(s):  
D. M. Bickhart ◽  
L. Xu ◽  
J. L. Hutchison ◽  
J. B. Cole ◽  
D. J. Null ◽  
...  

2019 ◽  
Author(s):  
Andrea Sanchini ◽  
Christine Jandrasits ◽  
Julius Tembrockhaus ◽  
Thomas Andreas Kohl ◽  
Christian Utpatel ◽  
...  

AbstractIntroductionImproving the surveillance of tuberculosis (TB) is especially important for multidrug-resistant (MDR) and extensively drug-resistant (XDR)-TB. The large amount of publicly available whole-genome sequencing (WGS) data for TB gives us the chance to re-use data and to perform additional analysis at a large scale.AimWe assessed the usefulness of raw WGS data of global MDR/XDR-TB isolates available from public repositories to improve TB surveillance.MethodsWe extracted raw WGS data and the related metadata of Mycobacterium tuberculosis isolates available from the Sequence Read Archive. We compared this public dataset with WGS data and metadata of 131 MDR- and XDR-TB isolates from Germany in 2012-2013.ResultsWe aggregated a dataset that includes 1,081 MDR and 250 XDR isolates among which we identified 133 molecular clusters. In 16 clusters, the isolates were from at least two different countries. For example, cluster2 included 56 MDR/XDR isolates from Moldova, Georgia, and Germany. By comparing the WGS data from Germany and the public dataset, we found that 11 clusters contained at least one isolate from Germany and at least one isolate from another country. We could, therefore, connect TB cases despite missing epidemiological information.ConclusionWe demonstrated the added value of using WGS raw data from public repositories to contribute to TB surveillance. By comparing the German and the public dataset, we identified potential international transmission events. Thus, using this approach might support the interpretation of national surveillance results in an international context.


2018 ◽  
Vol 56 (8) ◽  
Author(s):  
Cath Arnold ◽  
Kirstin Edwards ◽  
Meeta Desai ◽  
Steve Platt ◽  
Jonathan Green ◽  
...  

ABSTRACT Routine use of whole-genome analysis for infectious diseases can be used to enlighten various scenarios pertaining to public health, including identification of microbial pathogens, relating individual cases to an outbreak of infectious disease, establishing an association between an outbreak of food poisoning and a specific food vehicle, inferring drug susceptibility, source tracing of contaminants, and study of variations in the genome that affect pathogenicity/virulence. We describe the setup, validation, and ongoing verification of a centralized whole-genome-sequencing (WGS) laboratory to carry out sequencing for these public health functions for the National Infection Services, Public Health England, in the United Kingdom. The performance characteristics and quality control metrics measured during validation and verification of the entire end-to-end process (accuracy, precision, reproducibility, and repeatability) are described and include information regarding the automated pass and release of data to service users without intervention.


2012 ◽  
Vol 207 (4) ◽  
pp. 675-686 ◽  
Author(s):  
Kate E. Dingle ◽  
Xavier Didelot ◽  
M. Azim Ansari ◽  
David W. Eyre ◽  
Alison Vaughan ◽  
...  

mBio ◽  
2016 ◽  
Vol 7 (3) ◽  
Author(s):  
David M. Aanensen ◽  
Edward J. Feil ◽  
Matthew T. G. Holden ◽  
Janina Dordel ◽  
Corin A. Yeats ◽  
...  

ABSTRACTThe implementation of routine whole-genome sequencing (WGS) promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasiveStaphylococcus aureusisolates corresponding to a pan-European population snapshot, with epidemiological and resistance data. Geospatial visualization of the data is made possible by a generic software tool designed for public health purposes that is available at the project URL (http://www.microreact.org/project/EkUvg9uY?tt=rc). Our analysis demonstrates that high-risk clones can be identified on the basis of population level properties such as clonal relatedness, abundance, and spatial structuring and by inferring virulence and resistance properties on the basis of gene content. We also show thatin silicopredictions of antibiotic resistance profiles are at least as reliable as phenotypic testing. We argue that this work provides a comprehensive road map illustrating the three vital components for future molecular epidemiological surveillance: (i) large-scale structured surveys, (ii) WGS, and (iii) community-oriented database infrastructure and analysis tools.IMPORTANCEThe spread of antibiotic-resistant bacteria is a public health emergency of global concern, threatening medical intervention at every level of health care delivery. Several recent studies have demonstrated the promise of routine whole-genome sequencing (WGS) of bacterial pathogens for epidemiological surveillance, outbreak detection, and infection control. However, as this technology becomes more widely adopted, the key challenges of generating representative national and international data sets and the development of bioinformatic tools to manage and interpret the data become increasingly pertinent. This study provides a road map for the integration of WGS data into routine pathogen surveillance. We emphasize the importance of large-scale routine surveys to provide the population context for more targeted or localized investigation and the development of open-access bioinformatic tools to provide the means to combine and compare independently generated data with publicly available data sets.


2017 ◽  
pp. gkx019 ◽  
Author(s):  
Miaoxin Li ◽  
Jiang Li ◽  
Mulin Jun Li ◽  
Zhicheng Pan ◽  
Jacob Shujui Hsu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document