Resolving Complex Structural Genomic Rearrangements using a Randomized Approach

Complex chromosomal rearrangements consist of structural genomic alterations involving multiple instances of deletions, duplications, inversions, or translocations that co-occur either on the same chromosome or represent different overlapping events on homologous chromosomes. We present SVelter, an algorithm that first identifies regions of the genome suspected to harbor a complex event and then iteratively rearranges the local genome structure, in a randomized fashion, with each structure scored against characteristics of the observed sequencing data. We show that SVelter is able to accurately reconstruct these regions when compared to well-characterized genomes that have been deep sequenced with both short and long read technologies.

Download Full-text

Mapping and phasing of structural variation in patient genomes using nanopore sequencing

10.1101/129379 ◽

2017 ◽

Cited By ~ 4

Author(s):

Mircea Cretu Stancu ◽

Markus J. van Roosmalen ◽

Ivo Renkens ◽

Marleen Nieboer ◽

Sjors Middelkamp ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Human Genetic Disease ◽

Structural Genomic ◽

Short Read ◽

Sequencing Technologies ◽

Genome Wide ◽

Long Read ◽

Complex Structural

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.

Download Full-text

Highly accurate long-read HiFi sequencing data for five complex genomes

Scientific Data ◽

10.1038/s41597-020-00743-4 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Ting Hon ◽

Kristin Mars ◽

Greg Young ◽

Yu-Chih Tsai ◽

Joseph W. Karalius ◽

...

Keyword(s):

Sequence Data ◽

Genome Structure ◽

Data Sets ◽

Sequencing Data ◽

Complex Samples ◽

Bioinformatic Tools ◽

Long Reads ◽

Sequencing Method ◽

Sample Data ◽

Long Read

AbstractThe PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.

Download Full-text

Chromosomal Rearrangements in Post-Chernobyl Papillary Thyroid Carcinomas: Evaluation by Spectral Karyotyping and Automated Interphase FISH

Journal of Biomedicine and Biotechnology ◽

10.1155/2011/693691 ◽

2011 ◽

Vol 2011 ◽

pp. 1-7 ◽

Cited By ~ 11

Author(s):

Ludwig Hieber ◽

Reinhard Huber ◽

Verena Bauer ◽

Quirin Schäffner ◽

Herbert Braselmann ◽

...

Keyword(s):

Chromosomal Rearrangements ◽

Primary Cultures ◽

Genomic Rearrangements ◽

Fish Analysis ◽

Spectral Karyotyping ◽

Papillary Thyroid ◽

Chromosome 11 ◽

Structural Genomic ◽

Interphase Cell ◽

Thyroid Carcinomas

Structural genomic rearrangements are frequent findings in human cancers. Therefore, papillary thyroid carcinomas (PTCs) were investigated for chromosomal aberrations and rearrangements of the RET proto-oncogene. For this purpose, primary cultures from 23 PTC have been established and metaphase preparations were analysed by spectral karyotyping (SKY). In addition, interphase cell preparations of the same cases were investigated by fluorescencein situhybridisation (FISH) for the presence of RET/PTC rearrangements using RET-specific DNA probes. SKY analysis of PTC revealed structural aberrations of chromosome 11 and several numerical aberrations with frequent loss of chromosomes 20, 21, and 22. FISH analysis for RET/PTC rearrangements showed prevalence of this rearrangement in 72% (16 out of 22) of cases. However, only subpopulations of tumour cells exhibited this rearrangement indicating genetic heterogeneity. The comparison of visual and automated scoring of FISH signals revealed concordant results in 19 out of 22 cases (87%) indicating reliable scoring results using the optimised scoring parameter for RET/PTC with the automated Metafer4 system. It can be concluded from this study that genomic rearrangements are frequent in PTC and therefore important events in thyroid carcinogenesis.

Download Full-text

TSD: A computational tool to study the complex structural variants using PacBio targeted sequencing data

10.1101/474445 ◽

2018 ◽

Author(s):

Guofeng Meng ◽

Ying Tan ◽

Yue Fan ◽

Yan Wang ◽

Guang Yang ◽

...

Keyword(s):

Human Cell Line ◽

Targeted Sequencing ◽

Structural Variants ◽

Sequencing Data ◽

Rna Sequences ◽

Variant Discovery ◽

Powerful Approach ◽

Full Profile ◽

Long Read ◽

Complex Structural

ABSTRACTThe PacBio sequencing is a powerful approach to study the DNA or RNA sequences in a longer scope. It is especially useful in exploring the complex structural variants generated by random integration or multiple rearrangement of internal or external sequences. However, there is still no tool designed to uncover their structural organization in the host genome. Here, we present a tool, TSD, for complex structural variant discovery using PacBio targeted sequencing data. It allows researchers to identify and visualize the genomic structures of targeted sequences by unlimited splitting, alignment and assembly of long PacBio reads. Application to the sequencing data derived from an HBV integrated human cell line(PLC/PRF/5) indicated that TSD could recover the full profile of HBV integration events, especially for the regions with the complex human-HBV genome integrations and multiple HBV rearrangements. Compared to other long read analysis tools, TSD showed a better performance for detecting complex genomic structural variants. TSD is publicly available at: https://github.com/menggf/tsd

Download Full-text

Genomic characterization of a pathogenic isolate of Saccharomyces cerevisiae reveals an extensive and dynamic landscape of structural variation

10.1101/2021.08.20.457152 ◽

2021 ◽

Author(s):

Lydia R. Heasley ◽

Juan Lucas Argueso

Keyword(s):

Saccharomyces Cerevisiae ◽

Structural Variation ◽

Genome Structure ◽

Genomic Variation ◽

Whole Genome Sequencing Data ◽

Structural Genomic ◽

Pathogenic Isolate ◽

A Genome ◽

Long Read

The budding yeast Saccharomyces cerevisiae has been extensively characterized for many decades and is a critical resource for the study of numerous facets of eukaryotic biology. Recently, the analysis of whole genome sequencing data from over 1000 natural isolates of S. cerevisiae has provided critical insights into the evolutionary landscape of this species by revealing a population structure comprised of numerous genomically diverse lineages. These survey-level analyses have been largely devoid of structural genomic information, mainly because short read sequencing is not suitable for detailed characterization of genomic architecture. Consequently, we still lack a complete perspective of the genomic variation the exists within the species. Single molecule long read sequencing technologies, such as Oxford Nanopore and PacBio, provide sequencing-based approaches with which to rigorously define the structure of a genome, and have empowered yeast geneticists to explore this poorly described realm of eukaryotic genomics. Here, we present the comprehensive genomic structural analysis of a pathogenic isolate of S. cerevisiae, YJM311. We used long read sequence analysis to construct a haplotype-phased, telomere-to-telomere length assembly of the YJM311 diploid genome and characterized the structural variations (SVs) therein. We discovered that the genome of YJM311 contains significant intragenomic structural variation, some of which imparts notable consequences to the genomic stability and developmental biology of the strain. Collectively, we outline a new methodology for creating accurate haplotype-phased genome assemblies and highlight how such genomic analyses can define the structural architectures of S. cerevisiae isolates. It is our hope that through continued structural characterization of S. cerevisiae genomes, such as we have reported here for YJM311, we will comprehensively advance our understanding of eukaryotic genome structure-function relationships, structural diversity, and evolution.

Download Full-text

Highly accurate long-read HiFi sequencing data for five complex genomes

10.1101/2020.05.04.077180 ◽

2020 ◽

Author(s):

Ting Hon ◽

Kristin Mars ◽

Greg Young ◽

Yu-Chih Tsai ◽

Joseph W. Karalius ◽

...

Keyword(s):

Sequence Data ◽

Genome Structure ◽

Data Sets ◽

Sequencing Data ◽

Complex Samples ◽

Bioinformatic Tools ◽

Long Reads ◽

Sequencing Method ◽

Sample Data ◽

Long Read

AbstractThe PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.

Download Full-text

Accurate Haplotype-Resolved Assembly Reveals The Origin Of Structural Variants For Human Trios

Bioinformatics ◽

10.1093/bioinformatics/btab068 ◽

2021 ◽

Author(s):

Mengyang Xu ◽

Lidong Guo ◽

Xiao Du ◽

Lei Li ◽

Brock A Peters ◽

...

Keyword(s):

De Novo ◽

Substantial Improvement ◽

Supplementary Information ◽

Sequencing Data ◽

Homologous Chromosomes ◽

Assembly Method ◽

Long Reads ◽

Long Read ◽

Second Generation Sequencing ◽

Generation Sequencing

Abstract Motivation Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains challenging. Results To address this, we have developed Haplotype-resolved Assembly for Synthetic long reads using a Trio-binning strategy, or HAST, which uses parental information to classify reads into maternal or paternal. Once sorted, these reads are used to independently de novo assemble the parent-specific haplotypes. We applied HAST to co-barcoded second-generation sequencing data from an Asian individual, resulting in a haplotype assembly covering 94.7% of the reference genome with a scaffold N50 longer than 11 Mb. The high haplotyping precision (∼99.7%) and recall (∼95.9%) represents a substantial improvement over the commonly used tool for assembling co-barcoded reads (Supernova), and is comparable to a trio-binning-based third generation long-read based assembly method (TrioCanu) but with a significantly higher single-base accuracy (up to 99.99997% (Q65)). This makes HAST a superior tool for accurate haplotyping and future haplotype-based studies. Availability The code of the analysis is available at https://github.com/BGI-Qingdao/HAST. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genomic characterization of a wild diploid isolate of Saccharomyces cerevisiae reveals an extensive and dynamic landscape of structural variation

Genetics ◽

10.1093/genetics/iyab193 ◽

2021 ◽

Author(s):

Lydia R Heasley ◽

Juan Lucas Argueso

Keyword(s):

Saccharomyces Cerevisiae ◽

Sequence Analysis ◽

Structural Variation ◽

Genome Structure ◽

Genomic Variation ◽

Genomic Diversity ◽

Structural Genomic ◽

A Genome ◽

Long Read

Abstract The budding yeast Saccharomyces cerevisiae has been extensively characterized for many decades and is a critical resource for the study of numerous facets of eukaryotic biology. Recently, whole genome sequence analysis of over 1000 natural isolates of S. cerevisiae has provided critical insights into the evolutionary landscape of this species by revealing a population structure comprised of numerous genomically diverse lineages. These survey-level analyses have been largely devoid of structural genomic information, mainly because short read sequencing is not suitable for detailed characterization of genomic architecture. Consequently, we still lack a complete perspective of the genomic variation the exists within the species. Single molecule long read sequencing technologies, such as Oxford Nanopore and PacBio, provide sequencing-based approaches with which to rigorously define the structure of a genome, and have empowered yeast geneticists to explore this poorly described realm of eukaryotic genomics. Here, we present the comprehensive genomic structural analysis of a wild diploid isolate of S. cerevisiae, YJM311. We used long read sequence analysis to construct a haplotype-phased, telomere-to-telomere length assembly of the YJM311 genome and characterized the structural variations (SVs) therein. We discovered that the genome of YJM311 contains significant intragenomic structural variation, some of which imparts notable consequences to the genomic stability and developmental biology of the strain. Collectively, we outline a new methodology for creating accurate haplotype-phased genome assemblies and highlight how such genomic analyses can define the structural architectures of S. cerevisiae isolates. It is our hope that continued structural characterization of S. cerevisiae genomes, such as we have reported here for YJM311, will comprehensively advance our understanding of eukaryotic genome structure-function relationships, structural genomic diversity, and evolution.

Download Full-text

SVLR: Genome Structure Variant Detection Using Long Read Sequencing Data

Bioinformatics Research and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-57821-3_13 ◽

2020 ◽

pp. 140-153

Author(s):

Wenyan Gu ◽

Aizhong Zhou ◽

Lusheng Wang ◽

Shiwei Sun ◽

Xuefeng Cui ◽

...

Keyword(s):

Genome Structure ◽

Sequencing Data ◽

Long Read ◽

Variant Detection

Download Full-text

Hybrid sequencing resolves two germline ultra-complex chromosomal rearrangements consisting of 137 breakpoint junctions in a single carrier

Human Genetics ◽

10.1007/s00439-020-02242-3 ◽

2020 ◽

Author(s):

Jesper Eisfeldt ◽

Maria Pettersson ◽

Anna Petri ◽

Daniel Nilsson ◽

Lars Feuk ◽

...

Keyword(s):

De Novo ◽

Chromosomal Rearrangements ◽

Dna Fragments ◽

Statistical Assessment ◽

Single Carrier ◽

Complex Chromosomal Rearrangements ◽

Multistep Process ◽

Significant Enrichment ◽

Active Transcription ◽

Complex Structural

AbstractChromoanagenesis is a genomic event responsible for the formation of complex structural chromosomal rearrangements (CCRs). Germline chromoanagenesis is rare and the majority of reported cases are associated with an affected phenotype. Here, we report a healthy female carrying two de novo CCRs involving chromosomes 4, 19, 21 and X and chromosomes 7 and 11, respectively, with a total of 137 breakpoint junctions (BPJs). We characterized the CCRs using a hybrid-sequencing approach, combining short-read sequencing, nanopore sequencing, and optical mapping. The results were validated using multiple cytogenetic methods, including fluorescence in situ hybridization, spectral karyotyping, and Sanger sequencing. We identified 137 BPJs, which to our knowledge is the highest number of reported breakpoint junctions in germline chromoanagenesis. We also performed a statistical assessment of the positioning of the breakpoints, revealing a significant enrichment of BPJ-affecting genes (96 intragenic BPJs, 26 genes, p < 0.0001), indicating that the CCRs formed during active transcription of these genes. In addition, we find that the DNA fragments are unevenly and non-randomly distributed across the derivative chromosomes indicating a multistep process of scattering and re-joining of DNA fragments. In summary, we report a new maximum number of BPJs (137) in germline chromoanagenesis. We also show that a hybrid sequencing approach is necessary for the correct characterization of complex CCRs. Through in-depth statistical assessment, it was found that the CCRs most likely was formed through an event resembling chromoplexy—a catastrophic event caused by erroneous transcription factor binding.

Download Full-text