scholarly journals VarStack: a Web Tool for Data Retrieval to Interpret Somatic Variants in Cancer

2020 ◽  
Author(s):  
Morgan Howard ◽  
Bruce Kane ◽  
Mary Lepry ◽  
Paul Stey ◽  
Ashok Ragavendran ◽  
...  

AbstractBackground and objectiveAdvances in tumor genome sequencing created an urgent need for bioinformatics tools to support the interpretation of the clinical significance of the variants detected. VarStack is a web tool which is a base to retrieve somatic variant data in cancer from existing databases.MethodsVarStack incorporates data from several publicly available databases and presents them with an easy-to-navigate user-interface. It currently supports data from the Catalogue of Somatic Mutations in Cancer (COSMIC), gnomAD, cBioPortal, ClinVar, OncoKB and UCSC Genome browser. It retrieves the data from these databases and returns back to the user in a fraction of the time it would take to manually navigate each site independently.ResultsUsers submit a variant with gene symbol, peptide change, and coding sequence change. They may select a variety of tumor specific studies in cBioportal to search through in addition to their original query. The results from the databases are presented in tabs. Users can export the results as a CSV file. VarStack also has the batch search feature in which user submits a list of variants and download a CSV file with the data from the databases. With the batch search and data download options users can easily incorporate VarStack into their workflow or tools. VarStack saves time by providing variant data to the user from multiple databases in an easy-to-export and interpretable format.AvailabilityVarStack is freely available under https://varstack.brown.edu.

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Morgan Howard ◽  
Bruce Kane ◽  
Mary Lepry ◽  
Paul Stey ◽  
Ashok Ragavendran ◽  
...  

Abstract Advances in tumor genome sequencing created an urgent need for bioinformatics tools to support the interpretation of the clinical significance of the variants detected. VarStack is a web tool which is a base to retrieve somatic variant data relating to cancer from existing databases. VarStack incorporates data from several publicly available databases and presents them with an easy-to-navigate user interface. It currently supports data from the Catalogue of Somatic Mutations in Cancer, gnomAD, cBioPortal, ClinVar, OncoKB, CiViC and UCSC Genome Browser. It retrieves the data from these databases and returns them back to the user in a fraction of the time it would take to manually navigate each site independently. Users submit a variant with a gene symbol, peptide change and coding sequence change. They may select a variety of tumor-specific studies in cBioPortal to search through in addition to their original query. The results from the databases are presented in tabs. Users can export the results as an Excel file. VarStack also has the batch search feature in which the user can submit a list of variants and download an Excel file with the data from the databases. With the batch search and data download options, users can easily incorporate VarStack into their workflow or tools. VarStack saves time by providing somatic variant information to the user from multiple databases in an easy-to-export and interpretable format. VarStack is freely available under https://varstack.brown.edu.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. sci-36-sci-36
Author(s):  
Elaine Mardis ◽  
Timothy J. Ley ◽  
Richard K. Wilson

Abstract For most patients with a sporadic presentation of acute myeloid leukemia (AML), neither the initiating nor the progression mutations responsible for disease are known. Recent attempts to identify key mutations with directed sequencing approaches, or with array-based genomic studies, have had limited success, suggesting that unbiased whole genome sequencing approaches may be required to identify most of the mutations responsible for AML pathogenesis. Until recently, whole genome sequencing has been impractical due to the high cost of conventional capillary-based sequencing and the large numbers of enriched primary tumor cells required to yield the necessary genomic DNA for library preparation. “Next Generation” sequencing approaches have changed this landscape dramatically. Using the Solexa/Illumina platform, we have now sequenced the genomic DNA of highly enriched tumor cells and normal skin cells obtained from a carefully selected patient with a typical presentation of FAB M1 AML. We obtained 98.2 billion bases of sequences from the cytogenetically normal tumor cell genome (32.7 fold haploid coverage), and 41.8 billion bases of sequence from the normal skin genome (13.9 fold coverage). Using these data, we detected diploid sequence coverage of 91% of 46,320 heterozygous SNPs, defined in the tumor genome (by array-based genotyping), and 83% diploid coverage of the skin genome. Of 2,647,695 well-supported single nucleotide variants in the tumor genome, 2,588,486 (97.7%) were also detected in the patient’s skin genome, defining them as inherited. From the remaining variants, 8 have been fully validated as somatic mutations by conventional capillary sequencing using PCR-generated amplicons. We also detected somatic mutations in the FLT3 (ITD) and NPM1 genes (a classic NPMc mutation). Based on deep read-count data of the novel variants on a 454 sequencer, we hypothesize that all of the mutations are in virtually all of the tumor cells, and all were retained at relapse 11 months later, suggesting that a single dominant clone contained all of the mutations. None of the novel mutations has previously been detected in AML cases (and none were found in any of 187 additional AML cases studied here). A number of additional potential somatic mutations in regions lying near genes (but not altering coding sequences) are currently being validated and tested for recurrence in other AML samples. Whole genome sequencing of a second M1 AML genome is now underway. These results demonstrate the power of unbiased whole genome sequencing approaches to discover cancer-associated mutations in novel candidate genes.


2016 ◽  
Author(s):  
Stephen G. Gaffney ◽  
Jeffrey P. Townsend

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Lydia Y. Liu ◽  
Vinayak Bhandari ◽  
Adriana Salcedo ◽  
Shadrielle M. G. Espiritu ◽  
Quaid D. Morris ◽  
...  

AbstractWhole-genome sequencing can be used to estimate subclonal populations in tumours and this intra-tumoural heterogeneity is linked to clinical outcomes. Many algorithms have been developed for subclonal reconstruction, but their variabilities and consistencies are largely unknown. We evaluate sixteen pipelines for reconstructing the evolutionary histories of 293 localized prostate cancers from single samples, and eighteen pipelines for the reconstruction of 10 tumours with multi-region sampling. We show that predictions of subclonal architecture and timing of somatic mutations vary extensively across pipelines. Pipelines show consistent types of biases, with those incorporating SomaticSniper and Battenberg preferentially predicting homogenous cancer cell populations and those using MuTect tending to predict multiple populations of cancer cells. Subclonal reconstructions using multi-region sampling confirm that single-sample reconstructions systematically underestimate intra-tumoural heterogeneity, predicting on average fewer than half of the cancer cell populations identified by multi-region sequencing. Overall, these biases suggest caution in interpreting specific architectures and subclonal variants.


2020 ◽  
Vol 22 (11) ◽  
pp. 1892-1897 ◽  
Author(s):  
My Linh Thibodeau ◽  
Kieran O’Neill ◽  
Katherine Dixon ◽  
Caralyn Reisle ◽  
Karen L. Mungall ◽  
...  

Abstract Purpose Structural variants (SVs) may be an underestimated cause of hereditary cancer syndromes given the current limitations of short-read next-generation sequencing. Here we investigated the utility of long-read sequencing in resolving germline SVs in cancer susceptibility genes detected through short-read genome sequencing. Methods Known or suspected deleterious germline SVs were identified using Illumina genome sequencing across a cohort of 669 advanced cancer patients with paired tumor genome and transcriptome sequencing. Candidate SVs were subsequently assessed by Oxford Nanopore long-read sequencing. Results Nanopore sequencing confirmed eight simple pathogenic or likely pathogenic SVs, resolving three additional variants whose impact could not be fully elucidated through short-read sequencing. A recurrent sequencing artifact on chromosome 16p13 and one complex rearrangement on chromosome 5q35 were subsequently classified as likely benign, obviating the need for further clinical assessment. Variant configuration was further resolved in one case with a complex pathogenic rearrangement affecting TSC2. Conclusion Our findings demonstrate that long-read sequencing can improve the validation, resolution, and classification of germline SVs. This has important implications for return of results, cascade carrier testing, cancer screening, and prophylactic interventions.


Blood ◽  
2009 ◽  
Vol 114 (22) ◽  
pp. 3965-3965
Author(s):  
Lukas D. Wartman ◽  
Li Ding ◽  
David E. Larson ◽  
Michael D. McLellan ◽  
Heather Schmidt ◽  
...  

Abstract Abstract 3965 Poster Board III-901 We have recently established that whole genome sequencing is a valid, unbiased approach that can identify novel candidate mutations that may be important for AML pathogenesis (Ley et al Nature 2008, Mardis et al NEJM 2009). Acute promyelocytic leukemia (APL, FAB M3 AML) is a subtype of AML characterized by the t(15;17)(q22;q11.2) translocation that creates an oncogenic fusion gene, PML-RARA. Our laboratory has previously modeled APL in a mouse in an effort to understand the genetic events that lead to the disease. In our knockin mouse model, a human PML-RARA cDNA was targeted to the 5' untranslated region of the mouse cathepsin G gene on chromosome 14 (mCG-PR). The targeting vector was transfected into the RW-4 embryonic stem cell line, derived from a 129/SvJ mouse. The transfected RW-4 cells were injected into C57Bl/6 blastocysts, and chimeric offspring were bred to C57Bl/6 mice. F1 129/SvJ x C57Bl/6 mice were subsequently backcrossed onto the B6/Taconic background for 10 generations before establishing a tumor watch. About 60% of the mCG-PR mice in the Bl/6 background develop a disease that closely resembles APL only after a latent period of 7-18 months, suggesting that additional progression mutations are required for APL development. Array-based genomic techniques (expression array studies and high resolution CGH) have revealed some recurring genetic alterations that may be relevant for progression (i.e. an interstitial deletion of chromosome 2, trisomy 15, etc.), but gene-specific progression mutations have not yet been identified. To begin to identify these mutations in an unbiased fashion, we sequenced a cytogenetically normal, diploid mouse APL genome using massively parallel DNA sequencing via the Illumina platform. Since the tumor arose in a highly inbred mouse strain, we predicted that 15x coverage of the genome (approximately 40 billion base pairs of sequence) would be necessary to identify >90% of the heterozygous somatic mutations. We generated 2 Illumina paired-end libraries (insert sizes of 300-350 bp and 550-600 bp) and generated 59.64 billion base pairs of sequence with 3 full sequencing runs; the reads that successfully mapped generated 15.6x coverage. The sequence data predicted 87,778 heterozygous Single Nucleotide Variants (SNVs) compared to the mouse C57Bl6/J reference sequence, and 23,439 homozygous SNVs. Of the predicted heterozygous SNVs, 695 were non-synonymous (missense or nonsense, or altering a canonical splice site). Thus far, 80 of these putative non-synonymous SNVs have been further analyzed using Sanger sequencing of the original tumor DNA vs. pooled B6/Taconic spleen DNA and pooled129/SvJ spleen DNA as controls. 37/80 were shown to be false positive calls, and 37 were inherited SNPs from residual regions of the129/SvJ genome. 6/80 were present only in the tumor genome, and were candidate somatic mutations. These 6 were screened in 89 additional murine APL tumor samples derived from the same mouse model. Mutations in the Jarid2 (L915I) and Capns2 (N149S) genes occurred only in the proband, and are therefore of uncertain significance. 4/6 mutations were found in additional samples; 3 of these mutations were derived from a common ancestor of the proband and the other affected mice, and were therefore not relevant for pathogenesis. The other recurring mutation was in the pseudokinase domain of JAK1 (V657F), and was identified in one other mouse that was not closely related to the proband. This mutation is orthologous to the known activating mutation V617F in human JAK2, and is identical to a recently described JAK1 pseudokinase domain mutation (V658F) found in human APL and T-ALL samples (EG Jeong et al, Clin Can Res 14: 3716, 2008). We are currently testing the functional significance of this mutation by expressing it in bone marrow cells derived from young WT vs. mCG-PR mice. In summary, unbiased whole genome sequencing of a mouse APL genome has identified a recurring mutation of JAK1 found in both human and mouse APL samples. This approach may allow us to rapidly identify progression mutations that are common to human and murine AML, and provides an important proof-of-concept that this mouse model of AML is functionally related to its human counterpart. Disclosures: No relevant conflicts of interest to declare.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 261-261 ◽  
Author(s):  
Lian Xu ◽  
Aliyah R. Sohani ◽  
Luca Arcaini ◽  
Zachary Hunter ◽  
Guang Yang ◽  
...  

Abstract Abstract 261 Lymphoplasmacytic (LPL) and marginal zone lymphoma (MZL) are distinct clinicopathological entities under the WHO classification system for B-cell lymphomas. Differentiation of LPL from MZL has been difficult due to overlapping clinical, morphological, histopathological, immunophenotypic, and cytogenetic features. We therefore sought to identify a molecular marker by which LPL could be differentiated from MZL. Using paired normal/tumor tissues from 10 LPL patients, whole genome sequencing was utilized to identify somatic variants. These studies identified a somatic variant at position 38182641 in chromosome 3p22.2 with a single nucleotide change from T→C in the myeloid differentiation primary response (MYD88) gene, and a predicted non-synonymous change at amino acid position 265 from leucine to proline (L265P) in 10 of 10 LPL patients. MYD88 L265P is an oncogenically active mutation in DLBCL ABC cell lines via activation of IRAK1/4/TRAF-6/NF-κβ signaling, and is present in tumors from 29% of patients with ABC subtype of DLBCL, and 6% of patients with MALT lymphomas (Ngo et al, Nature 2011, 470:115–119). Further to these efforts, we performed Sanger sequencing of MYD88 in malignant cells obtained from 51 patients with LPL, 49 of whom had an IgM monoclonal protein and were therefore classified as Waldenstrom's Macroglobulinemia (WM), and 2 with an IgG monoclonal protein, along with 46 patients with MZL, which included 21 Splenic (SMZL), 20 Extranodal (EMZL), and 5 Nodal (NMZL) Subtypes, as well as B-cells from 15 healthy donors. Among LPL patients, the MYD88 L265P variant was found in malignant cells from 46/51 (90.1%) cases, which included 44 patients with WM, and 2 patients with IgG LPL. Expression of the MYD88 L265P variant was heterozygous in 42, and homozygous in 4 LPL patients. By comparison, only 3/46 (6.5%) patients with MZL (1 SMZL; 1 EMZL; 1 NMZL) exhibited the MYD88 L265P variant which was heterozygous (p<0.0001), and included 2 patients (1 SMZL, 1 NMZL) with extensive bone marrow involvement, a monoclonal IgM protein, and whose clinicopathological characteristics overlapped with LPL. By comparison, the MYD88 L265P variant was absent in CD19+ cells from all 15 healthy donors. The results of this study demonstrate that the MYD88 L265P mutation is widely expressed in patients with LPL, and can be used to differentiate LPL from MZL. Disclosures: Treon: Millennium: Consultancy, Membership on an entity's Board of Directors or advisory committees; Celgene: Membership on an entity's Board of Directors or advisory committees.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 404-404 ◽  
Author(s):  
John S. Welch ◽  
David Larson ◽  
Li Ding ◽  
Michael D. McLellan ◽  
Tamara Lamprecht ◽  
...  

Abstract Abstract 404 To characterize the genomic events associated with distinct subtypes of AML, we used whole genome sequencing to compare 24 tumor/normal sample pairs from patients with normal karyotype (NK) M1-AML (12 cases) and t(15;17)-positive M3-AML (12 cases). All single nucleotide variants (SNVs), small insertions and deletions (indels), and cryptic structural variants (SVs) identified by whole genome sequencing (average coverage 28x) were validated using sample-specific custom Nimblegen capture arrays, followed by Illumina sequencing; an average coverage of 972 reads per somatic variant yielded 10,597 validated somatic variants (average 421/genome). Of these somatic mutations, 308 occurred in 286 unique genes; on average, 9.4 somatic mutations per genome had translational consequences. Several important themes emerged: 1) AML genomes contain a diverse range of recurrent mutations. We assessed the 286 mutated genes for recurrency in an additional 34 NK M1-AML cases and 9 M3-AML cases. We identified 51 recurrently mutated genes, including 37 that had not previously been described in AML; on average, each genome had 3 recurrently mutated genes (M1 = 3.2; M3 = 2.8, p = 0.32). 2) Many recurring mutations cluster in mutually exclusive pathways, suggesting pathophysiologic importance. The most commonly mutated genes were: FLT3 (36%), NPM1 (25%), DNMT3A (21%), IDH1 (18%), IDH2 (10%), TET2 (10%), ASXL1 (6%), NRAS (6%), TTN (6%), and WT1 (6%). In total, 3 genes (excluding PML-RARA) were mutated exclusively in M3 cases. 22 genes were found only in M1 cases (suggestive of alternative initiating mutations which occurred in methylation, signal transduction, and cohesin complex genes). 25 genes were mutated in both M1 and M3 genomes (suggestive of common progression mutations relevant for both subtypes). A single mutation in a cell growth/signaling gene occurred in 38 of 67 cases (FLT3, NRAS, RUNX1, KIT, CACNA1E, CADM2, CSMD1); these mutations were mutually exclusive of one another, and many of them occurred in genomes with PML-RARA, suggesting that they are progression mutations. We also identified a new leukemic pathway: mutations were observed in all four genes that encode members of the cohesin complex (STAG2, SMC1A, SMC3, RAD21), which is involved in mitotic checkpoints and chromatid separation. The cohesin mutations were mutually exclusive of each other, and collectively occur in 10% of non-M3 AML patients. 3) AML genomes also contain hundreds of benign “passenger” mutations. On average 412 somatic mutations per genome were translationally silent or occurred outside of annotated genes. Both M1 and M3 cases had similar total numbers of mutations per genome, similar mutation types (which favored C>T/G>A transitions), and a similar random distribution of variants throughout the genome (which was affected neither by coding regions nor expression levels). This is consistent with our recent observations of random “passenger” mutations in hematopoietic stem cell (HSC) clones derived from normal patients (Ley et al manuscript in preparation), and suggests that most AML-associated mutations are not pathologic, but pre-existed in the HSC at the time of initial transformation. In both studies, the total number of SNVs per genome correlated positively with the age of the patient (R2 = 0.48, p = 0.001), providing a possible explanation for the increasing incidence of AML in elderly patients. 4) NK M1 and M3 AML samples are mono- or oligo-clonal. By comparing the frequency of all somatic mutations within each sample, we could identify clusters of mutations with similar frequencies (leukemic clones) and determined that the average number of clones per genome was 1.8 (M1 = 1.5; M3 = 2.2; p = 0.04). 5) t(15;17) is resolved by a non-homologous end-joining repair pathway, since nucleotide resolution of all 12 t(15;17) breakpoints revealed inconsistent micro-homologies (0 – 7 bp). Summary: These data provide a genome-wide overview of NK and t(15;17) AML and provide important new insights into AML pathogenesis. AML genomes typically contain hundreds of random, non-genic mutations, but only a handful of recurring mutated genes that are likely to be pathogenic because they cluster in mutually exclusive pathways; specific combinations of recurring mutations, as well as rare and private mutations, shape the leukemia phenotype in an individual patient, and help to explain the clinical heterogeneity of this disease. Disclosures: Westervelt: Novartis: Speakers Bureau.


Sign in / Sign up

Export Citation Format

Share Document