A heuristic platform for clinical interpretation of cancer genome sequencing data.

2012 ◽  
Vol 30 (15_suppl) ◽  
pp. 10502-10502
Author(s):  
Eliezer Mendel Van Allen ◽  
Nikhil Wagle ◽  
Gregory Kryukov ◽  
Alexis Ramos ◽  
Gad Getz ◽  
...  

10502 Background: The ability to identify and effectively sort the full spectrum of biologically and therapeutically relevant genetic alterations identified by massively parallel sequencing may improve cancer care. A major challenge involves rapid and rational categorization of data-intensive output, including somatic mutations, insertions/deletions, copy number alterations, and rearrangements into ranked categories for clinician review. Methods: A database of clinically actionable alterations was created, consisting of over 100 annotated genes known to undergo somatic genomic alterations in cancer that may impact clinical decision-making. A heuristic algorithm was developed, which selectively identifies somatic alterations based on the clinically actionable alterations database. Remaining variants are sorted based on additional heuristics, including high priority alterations based on presence in the Cancer Gene Census, biologically significant cancer genes based on presence in COSMIC or MSigDB, and low priority alterations in the same gene family as biologically significant cancer genes. The heuristic algorithm was applied to whole exome sequencing data of clinical samples and whole genome sequencing data from a cohort of prostate cancer samples processed using established Broad Institute pipelines. Results: Application of the heuristic algorithm to the prostate cancer whole genome rearrangement data identified 172 (out of 5978) rearrangements involving actionable genes (averaging 2-3 events per tumor). Furthermore, two clinical samples processed prospectively were analyzed, yielding three potentially actionable alterations for clinical review. Conclusions: The heuristic model for clinical interpretation of next generation sequencing data may facilitate rapid analysis of tumor genomic information for clinician review by identifying and prioritizing alterations that can directly impact care. Our platform can also be applied to research data to prospectively explore clinically relevant findings from existing cohorts. Future analytical approaches using heuristic or probabilistic algorithms should underpin a robust prospective assessment of clinical cancer genome data.

2019 ◽  
Author(s):  
Ronan M. Doyle ◽  
Denise M. O’Sullivan ◽  
Sean D. Aller ◽  
Sebastian Bruchmann ◽  
Taane Clark ◽  
...  

AbstractBackgroundAntimicrobial resistance (AMR) poses a threat to public health. Clinical microbiology laboratories typically rely on culturing bacteria for antimicrobial susceptibility testing (AST). As the implementation costs and technical barriers fall, whole-genome sequencing (WGS) has emerged as a ‘one-stop’ test for epidemiological and predictive AST results. Few published comparisons exist for the myriad analytical pipelines used for predicting AMR. To address this, we performed an inter-laboratory study providing sets of participating researchers with identical short-read WGS data sequenced from clinical isolates, allowing us to assess the reproducibility of the bioinformatic prediction of AMR between participants and identify problem cases and factors that lead to discordant results.MethodsWe produced ten WGS datasets of varying quality from cultured carbapenem-resistant organisms obtained from clinical samples sequenced on either an Illumina NextSeq or HiSeq instrument. Nine participating teams (‘participants’) were provided these sequence data without any other contextual information. Each participant used their own pipeline to determine the species, the presence of resistance-associated genes, and to predict susceptibility or resistance to amikacin, gentamicin, ciprofloxacin and cefotaxime.ResultsIndividual participants predicted different numbers of AMR-associated genes and different gene variants from the same clinical samples. The quality of the sequence data, choice of bioinformatic pipeline and interpretation of the results all contributed to discordance between participants. Although much of the inaccurate gene variant annotation did not affect genotypic resistance predictions, we observed low specificity when compared to phenotypic AST results but this improved in samples with higher read depths. Had the results been used to predict AST and guide treatment a different antibiotic would have been recommended for each isolate by at least one participant.ConclusionsWe found that participants produced discordant predictions from identical WGS data. These challenges, at the final analytical stage of using WGS to predict AMR, suggest the need for refinements when using this technology in clinical settings. Comprehensive public resistance sequence databases and standardisation in the comparisons between genotype and resistance phenotypes will be fundamental before AST prediction using WGS can be successfully implemented in standard clinical microbiology laboratories.


NAR Cancer ◽  
2020 ◽  
Vol 2 (4) ◽  
Author(s):  
HoJoon Lee ◽  
Ahmed Shuaibi ◽  
John M Bell ◽  
Dmitri S Pavlichin ◽  
Hanlee P Ji

Abstract Cancer genome sequencing has led to important discoveries such as the identification of cancer genes. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Matthew H. Bailey ◽  
◽  
William U. Meyerson ◽  
Lewis Jonathan Dursi ◽  
Liang-Bo Wang ◽  
...  

AbstractThe Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts.


2017 ◽  
Author(s):  
Jeremiah A Wala ◽  
Ofer Shapira ◽  
Yilong Li ◽  
David Craft ◽  
Steven E Schumacher ◽  
...  

AbstractCancer cells can acquire profound alterations to the structure of their genomes, including rearrangements that fuse distant DNA breakpoints. We analyze the distribution of somatic rearrangements across the cancer genome, using whole-genome sequencing data from 2,693 tumor-normal pairs. We observe substantial variation in the density of rearrangement breakpoints, with enrichment in open chromatin and sites with high densities of repetitive elements. After accounting for these patterns, we identify significantly recurrent breakpoints (SRBs) at 52 loci, including novel SRBs near BRD4 and AKR1C3. Taking into account both loci fused by a rearrangement, we observe different signatures resembling either single breaks followed by strand invasion or two separate breaks that become joined. Accounting for these signatures, we identify 90 pairs of loci that are significantly recurrently juxtaposed (SRJs). SRJs are primarily tumor-type specific and tend to involve genes with tissue-specific expression. SRJs were frequently associated with disruption of topology-associated domains, juxtaposition of enhancer elements, and increased expression of neighboring genes. Lastly, we find that the power to detect SRJs decreases for short rearrangements, and that reliable detection of all driver SRJs will require whole-genome sequencing data from an order of magnitude more cancer samples than currently available.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sung Yong Park ◽  
Gina Faraci ◽  
Pamela M. Ward ◽  
Jane F. Emerson ◽  
Ha Youn Lee

AbstractCOVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients’ Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.


2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Jacqueline King ◽  
Anne Pohlmann ◽  
Kamila Dziadek ◽  
Martin Beer ◽  
Kerstin Wernike

Abstract Background As a global ruminant pathogen, bovine viral diarrhea virus (BVDV) is responsible for the disease Bovine Viral Diarrhea with a variety of clinical presentations and severe economic losses worldwide. Classified within the Pestivirus genus, the species Pestivirus A and B (syn. BVDV-1, BVDV-2) are genetically differentiated into 21 BVDV-1 and four BVDV-2 subtypes. Commonly, the 5’ untranslated region and the Npro protein are utilized for subtyping. However, the genetic variability of BVDV leads to limitations in former studies analyzing genome fragments in comparison to a full-genome evaluation. Results To enable rapid and accessible whole-genome sequencing of both BVDV-1 and BVDV-2 strains, nanopore sequencing of twelve representative BVDV samples was performed on amplicons derived through a tiling PCR procedure. Covering a multitude of subtypes (1b, 1d, 1f, 2a, 2c), sample matrices (plasma, EDTA blood and ear notch), viral loads (Cq-values 19–32) and species (cattle and sheep), ten of the twelve samples produced whole genomes, with two low titre samples presenting 96 % genome coverage. Conclusions Further phylogenetic analysis of the novel sequences emphasizes the necessity of whole-genome sequencing to identify novel strains and supplement lacking sequence information in public repositories. The proposed amplicon-based sequencing protocol allows rapid, inexpensive and accessible obtainment of complete BVDV genomes.


Sign in / Sign up

Export Citation Format

Share Document