Loose ends in cancer genome structure

Recent pan-cancer studies have delineated patterns of structural genomic variation across thousands of tumor whole genome sequences. It is not known to what extent the shortcomings of short read (≤ 150 bp) whole genome sequencing (WGS) used for structural variant analysis has limited our understanding of cancer genome structure. To formally address this, we introduce the concept of "loose ends" - copy number alterations that cannot be mapped to a rearrangement by WGS but can be indirectly detected through the analysis of junction-balanced genome graphs. Analyzing 2,319 pan-cancer WGS cases across 31 tumor types, we found loose ends were enriched in reference repeats and fusions of the mappable genome to repetitive or foreign sequences. Among these we found genomic footprints of neotelomeres, which were surprisingly enriched in cancers with low telomerase expression and alternate lengthening of telomeres phenotype. Our results also provide a rigorous upper bound on the role of non-allelic homologous recombination (NAHR) in large-scale cancer structural variation, while nominating INO80, FANCA, and ARID1A as positive modulators of somatic NAHR. Taken together, we estimate that short read WGS maps >97% of all large-scale (>10 kbp) cancer structural variation; the rest represent loose ends that require long molecule profiling to unambiguously resolve. Our results have broad relevance for future research and clinical applications of short read WGS and delineate precise directions where long molecule studies might provide transformative insight into cancer genome structure.

Download Full-text

Genomic characterization of a pathogenic isolate of Saccharomyces cerevisiae reveals an extensive and dynamic landscape of structural variation

10.1101/2021.08.20.457152 ◽

2021 ◽

Author(s):

Lydia R. Heasley ◽

Juan Lucas Argueso

Keyword(s):

Saccharomyces Cerevisiae ◽

Structural Variation ◽

Genome Structure ◽

Genomic Variation ◽

Whole Genome Sequencing Data ◽

Structural Genomic ◽

Pathogenic Isolate ◽

A Genome ◽

Long Read

The budding yeast Saccharomyces cerevisiae has been extensively characterized for many decades and is a critical resource for the study of numerous facets of eukaryotic biology. Recently, the analysis of whole genome sequencing data from over 1000 natural isolates of S. cerevisiae has provided critical insights into the evolutionary landscape of this species by revealing a population structure comprised of numerous genomically diverse lineages. These survey-level analyses have been largely devoid of structural genomic information, mainly because short read sequencing is not suitable for detailed characterization of genomic architecture. Consequently, we still lack a complete perspective of the genomic variation the exists within the species. Single molecule long read sequencing technologies, such as Oxford Nanopore and PacBio, provide sequencing-based approaches with which to rigorously define the structure of a genome, and have empowered yeast geneticists to explore this poorly described realm of eukaryotic genomics. Here, we present the comprehensive genomic structural analysis of a pathogenic isolate of S. cerevisiae, YJM311. We used long read sequence analysis to construct a haplotype-phased, telomere-to-telomere length assembly of the YJM311 diploid genome and characterized the structural variations (SVs) therein. We discovered that the genome of YJM311 contains significant intragenomic structural variation, some of which imparts notable consequences to the genomic stability and developmental biology of the strain. Collectively, we outline a new methodology for creating accurate haplotype-phased genome assemblies and highlight how such genomic analyses can define the structural architectures of S. cerevisiae isolates. It is our hope that through continued structural characterization of S. cerevisiae genomes, such as we have reported here for YJM311, we will comprehensively advance our understanding of eukaryotic genome structure-function relationships, structural diversity, and evolution.

Download Full-text

Genomic characterization of a wild diploid isolate of Saccharomyces cerevisiae reveals an extensive and dynamic landscape of structural variation

Genetics ◽

10.1093/genetics/iyab193 ◽

2021 ◽

Author(s):

Lydia R Heasley ◽

Juan Lucas Argueso

Keyword(s):

Saccharomyces Cerevisiae ◽

Sequence Analysis ◽

Structural Variation ◽

Genome Structure ◽

Genomic Variation ◽

Genomic Diversity ◽

Structural Genomic ◽

A Genome ◽

Long Read

Abstract The budding yeast Saccharomyces cerevisiae has been extensively characterized for many decades and is a critical resource for the study of numerous facets of eukaryotic biology. Recently, whole genome sequence analysis of over 1000 natural isolates of S. cerevisiae has provided critical insights into the evolutionary landscape of this species by revealing a population structure comprised of numerous genomically diverse lineages. These survey-level analyses have been largely devoid of structural genomic information, mainly because short read sequencing is not suitable for detailed characterization of genomic architecture. Consequently, we still lack a complete perspective of the genomic variation the exists within the species. Single molecule long read sequencing technologies, such as Oxford Nanopore and PacBio, provide sequencing-based approaches with which to rigorously define the structure of a genome, and have empowered yeast geneticists to explore this poorly described realm of eukaryotic genomics. Here, we present the comprehensive genomic structural analysis of a wild diploid isolate of S. cerevisiae, YJM311. We used long read sequence analysis to construct a haplotype-phased, telomere-to-telomere length assembly of the YJM311 genome and characterized the structural variations (SVs) therein. We discovered that the genome of YJM311 contains significant intragenomic structural variation, some of which imparts notable consequences to the genomic stability and developmental biology of the strain. Collectively, we outline a new methodology for creating accurate haplotype-phased genome assemblies and highlight how such genomic analyses can define the structural architectures of S. cerevisiae isolates. It is our hope that continued structural characterization of S. cerevisiae genomes, such as we have reported here for YJM311, will comprehensively advance our understanding of eukaryotic genome structure-function relationships, structural genomic diversity, and evolution.

Download Full-text

Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample

F1000Research ◽

10.12688/f1000research.9912.1 ◽

2016 ◽

Vol 5 ◽

pp. 2644 ◽

Cited By ~ 1

Author(s):

William P. Gilks ◽

Tanya M. Pennell ◽

Ilona Flis ◽

Matthew T. Webster ◽

Edward H. Morrow

Keyword(s):

Drosophila Melanogaster ◽

Complex Traits ◽

Population Sample ◽

Genomic Variation ◽

Genotype Data ◽

Whole Genome ◽

Unique Haplotype ◽

Short Read ◽

Short Read Archive ◽

Ncbi Short Read Archive

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/).

Download Full-text

Cytogenetics and large scale structural genomic variation

Human Genetic Diversity ◽

10.1093/acprof:oso/9780199227693.003.0003 ◽

2009 ◽

pp. 85-104

Author(s):

Julian C. Knight

Keyword(s):

Large Scale ◽

Genomic Variation ◽

Structural Genomic

Download Full-text

Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample

F1000Research ◽

10.12688/f1000research.9912.3 ◽

2016 ◽

Vol 5 ◽

pp. 2644 ◽

Cited By ~ 1

Author(s):

William P. Gilks ◽

Tanya M. Pennell ◽

Ilona Flis ◽

Matthew T. Webster ◽

Edward H. Morrow

Keyword(s):

Drosophila Melanogaster ◽

Complex Traits ◽

High Throughput Sequencing ◽

Population Sample ◽

Genomic Variation ◽

Genotype Data ◽

Whole Genome ◽

Short Read ◽

Short Read Archive ◽

Ncbi Short Read Archive

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used high-throughput sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LHM). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/).

Download Full-text

Single-molecule analysis reveals widespread structural variation in multiple myeloma

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1418577112 ◽

2015 ◽

Vol 112 (25) ◽

pp. 7689-7694 ◽

Cited By ~ 29

Author(s):

Aditya Gupta ◽

Michael Place ◽

Steven Goldstein ◽

Deepayan Sarkar ◽

Shiguo Zhou ◽

...

Keyword(s):

Multiple Myeloma ◽

Tumor Progression ◽

Single Molecule ◽

Large Scale ◽

Structural Variation ◽

Plasma Cells ◽

Optical Mapping ◽

Genome Structure ◽

Genomic Analysis ◽

Whole Genome Analysis

Multiple myeloma (MM), a malignancy of plasma cells, is characterized by widespread genomic heterogeneity and, consequently, differences in disease progression and drug response. Although recent large-scale sequencing studies have greatly improved our understanding of MM genomes, our knowledge about genomic structural variation in MM is attenuated due to the limitations of commonly used sequencing approaches. In this study, we present the application of optical mapping, a single-molecule, whole-genome analysis system, to discover new structural variants in a primary MM genome. Through our analysis, we have identified and characterized widespread structural variation in this tumor genome. Additionally, we describe our efforts toward comprehensive characterization of genome structure and variation by integrating our findings from optical mapping with those from DNA sequencing-based genomic analysis. Finally, by studying this MM genome at two time points during tumor progression, we have demonstrated an increase in mutational burden with tumor progression at all length scales of variation.

Download Full-text

Allele-Specific Quantification of Structural Variations in Cancer Genomes

10.1101/048207 ◽

2016 ◽

Cited By ~ 1

Author(s):

Yang Li ◽

Shiguo Zhou ◽

David C. Schwartz ◽

Jian Ma

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Graphical Model ◽

Genome Structure ◽

Cancer Genome ◽

Whole Genome ◽

Structural Variations ◽

Cancer Genomes ◽

Allele Specific

AbstractOne of the hallmarks of cancer genome is aneuploidy, resulting in abnormal copy numbers of alleles. Structural variations (SVs) can further modify the aneuploid cancer genomes into a mixture of rearranged genomic segments with extensive range of somatic copy number alterations (CNAs). Indeed, aneuploid cancer genomes have significantly higher rate of CNAs and SVs. However, although methods have been developed to identify SVs and allele-specific copy number of genome (ASCNG) separately, no existing algorithm can simultaneously analyze SVs and ASCNG. Such integrated approach is particularly important to fully understand the complexity of cancer genomes. Here we introduce a new algorithm called Weaver to provide allele-specific quantification of SVs and CNAs in aneuploid cancer genomes. Weaver uses a probabilistic graphical model by utilizing cancer whole genome sequencing data to simultaneously estimate the digital copy number and inter-connectivity of SVs. Our simulation evaluation, comparison with single-molecule Optical Mapping analysis, and real data applications (including MCF-7, HeLa, and TCGA whole genome sequencing samples) demonstrated that Weaver is highly accurate and can greatly refine the analysis of complex cancer genome structure.

Download Full-text

Chromosome-level assembly of the Atlantic silverside genome reveals extreme levels of sequence diversity and structural genetic variation

10.1101/2020.10.27.357293 ◽

2020 ◽

Author(s):

Anna Tigano ◽

Arne Jacobs ◽

Aryn P. Wilder ◽

Ankita Nand ◽

Ye Zhan ◽

...

Keyword(s):

Genetic Variation ◽

Structural Variation ◽

Demographic History ◽

Genome Structure ◽

Genomic Variation ◽

Adaptive Divergence ◽

Effective Population ◽

Atlantic Silverside ◽

A Genome ◽

Chromosome Level

AbstractThe levels and distribution of standing genetic variation in a genome can provide a wealth of insights about the adaptive potential, demographic history, and genome structure of a population or species. As structural variants are increasingly associated with traits important for adaptation and speciation, investigating both sequence and structural variation is essential for wholly tapping this potential. Using a combination of shotgun sequencing, 10X Genomics linked reads and proximity-ligation data (Chicago and Hi-C), we produced and annotated a chromosome-level genome assembly for the Atlantic silverside (Menidia menidia) - an established ecological model for studying the phenotypic effects of natural and artificial selection - and examined patterns of genomic variation across two individuals sampled from different populations with divergent local adaptations. Levels of diversity varied substantially across each chromosome, consistently being highly elevated near the ends (presumably near telomeric regions) and dipping to near zero around putative centromeres. Overall, our estimate of the genome-wide average heterozygosity in the Atlantic silverside is the highest reported for a fish, or any vertebrate, to date (1.32-1.76% depending on inference method and sample). Furthermore, we also found extreme levels of structural variation, affecting ~23% of the total genome sequence, including multiple large inversions (> 1 Mb and up to 12.6 Mb) associated with previously identified haploblocks showing strong differentiation between locally adapted populations. These extreme levels of standing genetic variation are likely associated with large effective population sizes and may help explain the remarkable adaptive divergence among populations of the Atlantic silverside.

Download Full-text

Optimization of extraction of genomic DNA from archived dried blood spot (DBS): potential application in epidemiological research & bio banking

Gates Open Research ◽

10.12688/gatesopenres.12855.1 ◽

2018 ◽

Vol 2 ◽

pp. 57 ◽

Cited By ~ 2

Author(s):

Abhinendra Kumar ◽

Sharayu Mhatre ◽

Sheela Godbole ◽

Prabhat Jha ◽

Rajesh Dikshit

Keyword(s):

Genomic Dna ◽

Large Scale ◽

Venous Blood ◽

Magnetic Bead ◽

Epidemiological Studies ◽

Dried Blood Spot ◽

Future Research ◽

Prolonged Storage ◽

Whole Genome ◽

Blood Spot

Background: Limited infrastructure is available to collect, store and transport venous blood in field epidemiological studies. Dried blood spot (DBS) is a robust potential alternative sample source for epidemiological studies & bio banking. A stable source of genomic DNA (gDNA) is required for long term storage in bio bank for its downstream applications. Our objective is to optimize the methods of gDNA extraction from stored DBS and with the aim of revealing its utility in large scale epidemiological studies. Methods: The purpose of this study was to extract the maximum amount of gDNA from DBS on Whatman 903 protein saver card. gDNA was extracted through column (Qiagen) & magnetic bead based (Invitrogen) methods. Quantification of extracted gDNA was performed with a spectrophotometer, fluorometer, and integrity analyzed by agarose gel electrophoresis. Result: Large variation was observed in quantity & purity (260/280 ratio, 1.8-2.9) of the extracted gDNA. The intact gDNA bands on the electrophoresis gel reflect the robustness of DBS for gDNA even after prolonged storage time. The extracted gDNA amount 2.16 – 24 ng/µl is sufficient for its PCR based downstream application, but unfortunately it can’t be used for whole genome sequencing or genotyping from extracted gDNA. Sequencing or genotyping can be achieved by after increasing template copy number through whole genome amplification of extracted gDNA. The obtained results create a base for future research to develop high-throughput research and extraction methods from blood samples. Conclusion: The above results reveal, DBS can be utilized as a potential and robust sample source for bio banking in field epidemiological studies.

Download Full-text

Seave: a comprehensive web platform for storing and interrogating human genomic variation

10.1101/258061 ◽

2018 ◽

Cited By ~ 3

Author(s):

Velimir Gayevskiy ◽

Tony Roscioli ◽

Marcel E Dinger ◽

Mark J Cowley

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Variant Calling ◽

Genomic Variation ◽

Whole Genome ◽

Genome Data ◽

Pathogenicity Prediction ◽

Data Scaling ◽

Human Genomic ◽

Web Platform

AbstractCapability for genome sequencing and variant calling has increased dramatically, enabling large scale genomic interrogation of human disease. However, discovery is hindered by the current limitations in genomic interpretation, which remains a complicated and disjointed process. We introduce Seave, a web platform that enables variants to be easily filtered and annotated with in silico pathogenicity prediction scores and annotations from popular disease databases. Seave stores genomic variation of all types and sizes, and allows filtering for specific inheritance patterns, quality values, allele frequencies and gene lists. Seave is open source and deployable locally, or on a cloud computing provider, and works readily with gene panel, exome and whole genome data, scaling from single labs to multi-institution scale.

Download Full-text