scholarly journals A multi-platform reference for somatic structural variation detection

Author(s):  
Jose Espejo Valle-Inclan ◽  
Nicolle J.M. Besselink ◽  
Ewart de Bruijn ◽  
Daniel L. Cameron ◽  
Jana Ebler ◽  
...  

AbstractAccurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality gold standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines for comprehensive somatic SV detection. Here, we approached this challenge by genome-wide somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different technologies: Illumina HiSeq, Oxford Nanopore, Pacific Biosciences and 10x Genomics. Based on the evidence from multiple technologies combined with extensive experimental validation, including Bionano optical mapping data and targeted detection of candidate breakpoint junctions, we compiled a comprehensive set of true somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance of each technology as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects and data analysis tool evaluation. The reference truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.

Author(s):  
Fatma E. Taşbent ◽  
Mehmet Özdemir ◽  
Özge M. Akcan ◽  
Esma K. Kurt

Abstract Objective Genome sequencing is useful for following the change in mutation and variants in viral agent during pandemics. In this study, we performed next-generation sequencing of severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) complete genomes on pediatric patients. Methods Six pediatric patients aged 0 to 18 years who were positive for SARS-CoV-2 by reverse transcription polymerase chain reaction were included in this study. SARS-CoV-2 genome sequencing was performed using Oxford Nanopore Technologies MinION, following the ARTIC Network protocols. Sequencing data were obtained using the FASTQ program and quality assessment was evaluated. The sequence information of all samples was uploaded to the Global Initiative on Sharing All Influenza Data (GISAID) database. Genome, variant, clade, and phylogenetic tree analyses were performed with bioinformatic analysis. Results Two of these six samples were at 20A, two were at 20B, and two were at 19A in the nextstrain clade. According to Pango lineages, B.1.36, B.1.218, B.1, and B.1.260 lineages were detected. A total of 84 mutations were observed in all samples. None of the variants were classified as variants of concern (VOC) nor variants of interest (VOI) according to the Pango database. Conclusion This study is the first comprehensively sequence analysis registered in the GISAID database reported from the Konya region in Turkey. Similar studies will be informative to track changes in the virus genome, obtain epidemiological data, guide studies on diagnosis and treatment, and evaluate vaccine efficacy.


2019 ◽  
Author(s):  
Scott V. Nguyen ◽  
David R. Greig ◽  
Daniel Hurley ◽  
Yu Cao ◽  
Evonne McCabe ◽  
...  

ABSTRACTA Gram-negative rod from the Yersinia genus was isolated from a clinical case of yersiniosis in the United Kingdom. Long read sequencing data from an Oxford Nanopore Technology (ONT) MinION in conjunction with Illumina HiSeq reads were used to generate a finished quality genome of this strain. Overall Genome Related Index (OGRI) of the strain was used to determine that it was a novel species within Yersinia, despite biochemical similarities to Yersinia enterocolitica. The 16S ribosomal RNA gene accessions are MN434982-MN434987 and the accession number for the complete and closed chromosome is CP043727. The type strain is CFS3336T (=NCTC 14382T/ =LMG Accession under process).


2019 ◽  
Author(s):  
Gerald J. Sun ◽  
David F. Jenkins ◽  
Pablo E. Cingolani ◽  
Jonathan R. Dry ◽  
Zhongwu Lai

AbstractHere we present a machine learning model, Deep Purity (DePuty) that leverages convolutional neural networks to accurately predict tumor purity from next-generation sequencing data from clinical samples without matched normals. As input, our model utilizes SNP-based copy number and minor allele frequency data formulated as a scatterplot image. With a representation matching that used by expert human annotators, we best an existing algorithm using only ~100 manually curated samples. Our simple, data-efficient approach can serve as a straightforward alternative to traditional, more complex statistical methods, for building performant purity prediction models that enable downstream bioinformatic analysis of tumor variants and absolute copy number alterations relevant to cancer genomics.


2021 ◽  
Vol 5 (9) ◽  
pp. 2412-2425
Author(s):  
Suruchi Pacharne ◽  
Oliver M. Dovey ◽  
Jonathan L. Cooper ◽  
Muxin Gu ◽  
Mathias J. Friedrich ◽  
...  

Abstract Advances in cancer genomics have revealed genomic classes of acute myeloid leukemia (AML) characterized by class-defining mutations, such as chimeric fusion genes or in genes such as NPM1, MLL, and CEBPA. These class-defining mutations frequently synergize with internal tandem duplications in FLT3 (FLT3-ITDs) to drive leukemogenesis. However, ∼20% of FLT3-ITD–positive AMLs bare no class-defining mutations, and mechanisms of leukemic transformation in these cases are unknown. To identify pathways that drive FLT3-ITD mutant AML in the absence of class-defining mutations, we performed an insertional mutagenesis (IM) screening in Flt3-ITD mice, using Sleeping Beauty transposons. All mice developed acute leukemia (predominantly AML) after a median of 73 days. Analysis of transposon insertions in 38 samples from Flt3-ITD/IM leukemic mice identified recurrent integrations at 22 loci, including Setbp1 (20/38), Ets1 (11/38), Ash1l (8/38), Notch1 (8/38), Erg (7/38), and Runx1 (5/38). Insertions at Setbp1 led exclusively to AML and activated a transcriptional program similar, but not identical, to those of NPM1-mutant and MLL-rearranged AMLs. Guide RNA targeting of Setbp1 was highly detrimental to Flt3ITD/+/Setbp1IM+, but not to Flt3ITD/+/Npm1cA/+, AMLs. Also, analysis of RNA-sequencing data from hundreds of human AMLs revealed that SETBP1 expression is significantly higher in FLT3-ITD AMLs lacking class-defining mutations. These findings propose that SETBP1 overexpression collaborates with FLT3-ITD to drive a subtype of human AML. To identify genetic vulnerabilities of these AMLs, we performed genome-wide CRISPR-Cas9 screening in Flt3ITD/+/Setbp1IM+ AMLs and identified potential therapeutic targets, including Kdm1a, Brd3, Ezh2, and Hmgcr. Our study gives new insights into epigenetic pathways that can drive AMLs lacking class-defining mutations and proposes therapeutic approaches against such cases.


2020 ◽  
Vol 70 (4) ◽  
pp. 2382-2387 ◽  
Author(s):  
Scott V. Nguyen ◽  
David R. Greig ◽  
Daniel Hurley ◽  
Orla Donoghue ◽  
Yu Cao ◽  
...  

A Gram-negative rod from the Yersinia genus was isolated from a clinical case of yersiniosis in the United Kingdom. Long read sequencing data from an Oxford Nanopore Technologies (ONT) MinION in conjunction with Illumina HiSeq reads were used to generate a finished quality genome of this strain. Overall Genome Related Index (OGRI) of the strain was used to determine that it was a novel species within Yersinia , despite biochemical similarities to Yersinia enterocolitica . The 16S ribosomal RNA gene accessions are MN434982-MN434987 and the accession number for the complete and closed chromosome is CP043727. The type strain is SRR7544370T (=NCTC 14382T/=LMG 31573T).


Epigenomes ◽  
2020 ◽  
Vol 4 (4) ◽  
pp. 24
Author(s):  
Debapriya Saha ◽  
Allison B. Norvil ◽  
Nadia A. Lanman ◽  
Humaira Gowher

Differential DNA methylation is characteristic of gene regulatory regions, such as enhancers, which mostly constitute low or intermediate CpG content in their DNA sequence. Consequently, quantification of changes in DNA methylation at these sites is challenging. Given that DNA methylation across most of the mammalian genome is maintained, the use of genome-wide bisulfite sequencing to measure fractional changes in DNA methylation at specific sites is an overexertion which is both expensive and cumbersome. Here, we developed a MethylRAD technique with an improved experimental plan and bioinformatic analysis tool to examine regional DNA methylation changes in embryonic stem cells (ESCs) during differentiation. The transcriptional silencing of pluripotency genes (PpGs) during ESC differentiation is accompanied by PpG enhancer (PpGe) silencing mediated by the demethylation of H3K4me1 by LSD1. Our MethylRAD data show that in the presence of LSD1 inhibitor, a significant fraction of LSD1-bound PpGe fails to gain DNA methylation. We further show that this effect is mostly observed in PpGes with low/intermediate CpG content. Underscoring the sensitivity and accuracy of MethylRAD sequencing, our study demonstrates that this method can detect small changes in DNA methylation in regulatory regions, including those with low/intermediate CpG content, thus asserting its use as a method of choice for diagnostic purposes.


2021 ◽  
Author(s):  
Marc-André Lemay ◽  
Jonas A. Sibbesen ◽  
Davoud Torkamaneh ◽  
Jérémie Hamel ◽  
Roger C. Levesque ◽  
...  

Background: Structural variant (SV) discovery based on short reads is challenging due to their complex signatures and tendency to occur in repeated regions. The increasing availability of long-read technologies has greatly facilitated SV discovery, however these technologies remain too costly to apply routinely to population-level studies. Here, we combined short-read and long-read sequencing technologies to provide a comprehensive population-scale assessment of structural variation in a panel of Canadian soybean cultivars. Results: We used Oxford Nanopore sequencing data (~12X mean coverage) for 17 samples to both benchmark SV calls made from the Illumina data and predict SVs that were subsequently genotyped in a population of 102 samples using Illumina data. Benchmarking results show that variants discovered using Oxford Nanopore can be accurately genotyped from the Illumina data. We first use the genotyped SVs for population structure analysis and show that results are comparable to those based on single-nucleotide variants. We observe that the population frequency and distribution within the genome of SVs are constrained by the location of genes. Gene Ontology and PFAM domain enrichment analyses also confirm previous reports that genes harboring high-frequency SVs are enriched for functions in defense response. Finally, we discover polymorphic transposable elements from the SVs and report evidence of the recent activity of a Stowaway MITE. Conclusions: Our results demonstrate that long-read and short-read sequencing technologies can be efficiently combined to enhance SV analysis in large populations, providing a reusable framework for their study in a wider range of samples and non-model species.


Genes ◽  
2018 ◽  
Vol 9 (11) ◽  
pp. 547 ◽  
Author(s):  
Shuaibin Wang ◽  
Qingwei Song ◽  
Shanshan Li ◽  
Zhigang Hu ◽  
Gangqiang Dong ◽  
...  

Diversity in structure and organization is one of the main features of angiosperm mitochondrial genomes (mitogenomes). The ultra-long reads of Oxford Nanopore Technology (ONT) provide an opportunity to obtain a complete mitogenome and investigate the structural variation in unprecedented detail. In this study, we compared mitogenome assembly methods using Illumina and/or ONT sequencing data and obtained the complete mitogenome (208 kb) of Chrysanthemum nankingense based on the hybrid assembly method. The mitogenome encoded 19 transfer RNA genes, three ribosomal RNA genes, and 34 protein-coding genes with 21 group II introns disrupting eight intron-contained genes. A total of seven medium repeats were related to homologous recombination at different frequencies as supported by the long ONT reads. Subsequently, we investigated the variations in gene content and constitution of 28 near-complete mitogenomes from Asteraceae. A total of six protein-coding genes were missing in all Asteraceae mitogenomes, while four other genes were not detected in some lineages. The core fragments (~88 kb) of the Asteraceae mitogenomes had a higher GC content (~46.7%) than the variable and specific fragments. The phylogenetic topology based on the core fragments of the Asteraceae mitogenomes was highly consistent with the topologies obtained from the corresponding plastid datasets. Our results highlighted the advantages of the complete assembly of the C. nankingense mitogenome and the investigation of its structural variation based on ONT sequencing data. Moreover, the method based on local collinear blocks of the mitogenomes could achieve the alignment of highly rearrangeable and variable plant mitogenomes as well as construct a robust phylogenetic topology.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Robert P. Adelson ◽  
Alan E. Renton ◽  
Wentian Li ◽  
Nir Barzilai ◽  
Gil Atzmon ◽  
...  

Abstract The success of next-generation sequencing depends on the accuracy of variant calls. Few objective protocols exist for QC following variant calling from whole genome sequencing (WGS) data. After applying QC filtering based on Genome Analysis Tool Kit (GATK) best practices, we used genotype discordance of eight samples that were sequenced twice each to evaluate the proportion of potentially inaccurate variant calls. We designed a QC pipeline involving hard filters to improve replicate genotype concordance, which indicates improved accuracy of genotype calls. Our pipeline analyzes the efficacy of each filtering step. We initially applied this strategy to well-characterized variants from the ClinVar database, and subsequently to the full WGS dataset. The genome-wide biallelic pipeline removed 82.11% of discordant and 14.89% of concordant genotypes, and improved the concordance rate from 98.53% to 99.69%. The variant-level read depth filter most improved the genome-wide biallelic concordance rate. We also adapted this pipeline for triallelic sites, given the increasing proportion of multiallelic sites as sample sizes increase. For triallelic sites containing only SNVs, the concordance rate improved from 97.68% to 99.80%. Our QC pipeline removes many potentially false positive calls that pass in GATK, and may inform future WGS studies prior to variant effect analysis.


Author(s):  
Е.Н. Воропаева ◽  
Т.И. Поспелова ◽  
В.Н. Максимов ◽  
О.В. Березина ◽  
В.С. Карпова ◽  
...  

Проанализированы данные полногеномного секвенирования более 1200 образцов пациентов с диффузной В-крупноклеточной лимфомой (ДВККЛ) базы CBioPortal for Cancer Genomics. Также выполнено собственное полноэкзомное секвенирование 7 образцов ДВККЛ с рецидивами в ЦНС. Выделены характеристики, ассоциированные с высоким риском вторичного вовлечения ЦНС при лимфоме. Значимые различия в частоте мутаций между подвыборками с поражением ЦНС в рецидиве и без вовлечения ЦНС были получены по генам BCR/NF-kВ пути (MYD88, CD79B) и системы ремоделирования хроматина (ARID1A, SMARCA4). The genome-wide sequencing data of more than 1200 Diffuse Large B-cell Lymphomas (DLBCL) samples of the CBioPortal database for Genomic cancer were analyzed. We also performed our own full-exome sequencing of 7 samples of DLBCL with relapses in the central nervous system (CNS). Characteristics associated with a high level of secondary CNS involvement in lymphoma were selected. Differences were obtained in the BCR/NF-kB genotypes (MYD88, CD79B) and chromatin remodeling system (ARID1A, SMARCA4).


Sign in / Sign up

Export Citation Format

Share Document