scholarly journals Resolving Complex Structural Genomic Rearrangements using a Randomized Approach

2015 ◽  
Author(s):  
Xuefang Zhao ◽  
Sarah B. Emery ◽  
Bridget Myers ◽  
Jeffrey M. Kidd ◽  
Ryan E. Mills

Complex chromosomal rearrangements consist of structural genomic alterations involving multiple instances of deletions, duplications, inversions, or translocations that co-occur either on the same chromosome or represent different overlapping events on homologous chromosomes. We present SVelter, an algorithm that first identifies regions of the genome suspected to harbor a complex event and then iteratively rearranges the local genome structure, in a randomized fashion, with each structure scored against characteristics of the observed sequencing data. We show that SVelter is able to accurately reconstruct these regions when compared to well-characterized genomes that have been deep sequenced with both short and long read technologies.

2017 ◽  
Author(s):  
Mircea Cretu Stancu ◽  
Markus J. van Roosmalen ◽  
Ivo Renkens ◽  
Marleen Nieboer ◽  
Sjors Middelkamp ◽  
...  

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ting Hon ◽  
Kristin Mars ◽  
Greg Young ◽  
Yu-Chih Tsai ◽  
Joseph W. Karalius ◽  
...  

AbstractThe PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.


2011 ◽  
Vol 2011 ◽  
pp. 1-7 ◽  
Author(s):  
Ludwig Hieber ◽  
Reinhard Huber ◽  
Verena Bauer ◽  
Quirin Schäffner ◽  
Herbert Braselmann ◽  
...  

Structural genomic rearrangements are frequent findings in human cancers. Therefore, papillary thyroid carcinomas (PTCs) were investigated for chromosomal aberrations and rearrangements of the RET proto-oncogene. For this purpose, primary cultures from 23 PTC have been established and metaphase preparations were analysed by spectral karyotyping (SKY). In addition, interphase cell preparations of the same cases were investigated by fluorescencein situhybridisation (FISH) for the presence of RET/PTC rearrangements using RET-specific DNA probes. SKY analysis of PTC revealed structural aberrations of chromosome 11 and several numerical aberrations with frequent loss of chromosomes 20, 21, and 22. FISH analysis for RET/PTC rearrangements showed prevalence of this rearrangement in 72% (16 out of 22) of cases. However, only subpopulations of tumour cells exhibited this rearrangement indicating genetic heterogeneity. The comparison of visual and automated scoring of FISH signals revealed concordant results in 19 out of 22 cases (87%) indicating reliable scoring results using the optimised scoring parameter for RET/PTC with the automated Metafer4 system. It can be concluded from this study that genomic rearrangements are frequent in PTC and therefore important events in thyroid carcinogenesis.


2018 ◽  
Author(s):  
Guofeng Meng ◽  
Ying Tan ◽  
Yue Fan ◽  
Yan Wang ◽  
Guang Yang ◽  
...  

ABSTRACTThe PacBio sequencing is a powerful approach to study the DNA or RNA sequences in a longer scope. It is especially useful in exploring the complex structural variants generated by random integration or multiple rearrangement of internal or external sequences. However, there is still no tool designed to uncover their structural organization in the host genome. Here, we present a tool, TSD, for complex structural variant discovery using PacBio targeted sequencing data. It allows researchers to identify and visualize the genomic structures of targeted sequences by unlimited splitting, alignment and assembly of long PacBio reads. Application to the sequencing data derived from an HBV integrated human cell line(PLC/PRF/5) indicated that TSD could recover the full profile of HBV integration events, especially for the regions with the complex human-HBV genome integrations and multiple HBV rearrangements. Compared to other long read analysis tools, TSD showed a better performance for detecting complex genomic structural variants. TSD is publicly available at: https://github.com/menggf/tsd


2021 ◽  
Author(s):  
Lydia R. Heasley ◽  
Juan Lucas Argueso

The budding yeast Saccharomyces cerevisiae has been extensively characterized for many decades and is a critical resource for the study of numerous facets of eukaryotic biology. Recently, the analysis of whole genome sequencing data from over 1000 natural isolates of S. cerevisiae has provided critical insights into the evolutionary landscape of this species by revealing a population structure comprised of numerous genomically diverse lineages. These survey-level analyses have been largely devoid of structural genomic information, mainly because short read sequencing is not suitable for detailed characterization of genomic architecture. Consequently, we still lack a complete perspective of the genomic variation the exists within the species. Single molecule long read sequencing technologies, such as Oxford Nanopore and PacBio, provide sequencing-based approaches with which to rigorously define the structure of a genome, and have empowered yeast geneticists to explore this poorly described realm of eukaryotic genomics. Here, we present the comprehensive genomic structural analysis of a pathogenic isolate of S. cerevisiae, YJM311. We used long read sequence analysis to construct a haplotype-phased, telomere-to-telomere length assembly of the YJM311 diploid genome and characterized the structural variations (SVs) therein. We discovered that the genome of YJM311 contains significant intragenomic structural variation, some of which imparts notable consequences to the genomic stability and developmental biology of the strain. Collectively, we outline a new methodology for creating accurate haplotype-phased genome assemblies and highlight how such genomic analyses can define the structural architectures of S. cerevisiae isolates. It is our hope that through continued structural characterization of S. cerevisiae genomes, such as we have reported here for YJM311, we will comprehensively advance our understanding of eukaryotic genome structure-function relationships, structural diversity, and evolution.


2020 ◽  
Author(s):  
Ting Hon ◽  
Kristin Mars ◽  
Greg Young ◽  
Yu-Chih Tsai ◽  
Joseph W. Karalius ◽  
...  

AbstractThe PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.


Author(s):  
Mengyang Xu ◽  
Lidong Guo ◽  
Xiao Du ◽  
Lei Li ◽  
Brock A Peters ◽  
...  

Abstract Motivation Achieving a near complete understanding of how the genome of an individual affects the phenotypes of that individual requires deciphering the order of variations along homologous chromosomes in species with diploid genomes. However, true diploid assembly of long-range haplotypes remains challenging. Results To address this, we have developed Haplotype-resolved Assembly for Synthetic long reads using a Trio-binning strategy, or HAST, which uses parental information to classify reads into maternal or paternal. Once sorted, these reads are used to independently de novo assemble the parent-specific haplotypes. We applied HAST to co-barcoded second-generation sequencing data from an Asian individual, resulting in a haplotype assembly covering 94.7% of the reference genome with a scaffold N50 longer than 11 Mb. The high haplotyping precision (∼99.7%) and recall (∼95.9%) represents a substantial improvement over the commonly used tool for assembling co-barcoded reads (Supernova), and is comparable to a trio-binning-based third generation long-read based assembly method (TrioCanu) but with a significantly higher single-base accuracy (up to 99.99997% (Q65)). This makes HAST a superior tool for accurate haplotyping and future haplotype-based studies. Availability The code of the analysis is available at https://github.com/BGI-Qingdao/HAST. Supplementary information Supplementary data are available at Bioinformatics online.


Genetics ◽  
2021 ◽  
Author(s):  
Lydia R Heasley ◽  
Juan Lucas Argueso

Abstract The budding yeast Saccharomyces cerevisiae has been extensively characterized for many decades and is a critical resource for the study of numerous facets of eukaryotic biology. Recently, whole genome sequence analysis of over 1000 natural isolates of S. cerevisiae has provided critical insights into the evolutionary landscape of this species by revealing a population structure comprised of numerous genomically diverse lineages. These survey-level analyses have been largely devoid of structural genomic information, mainly because short read sequencing is not suitable for detailed characterization of genomic architecture. Consequently, we still lack a complete perspective of the genomic variation the exists within the species. Single molecule long read sequencing technologies, such as Oxford Nanopore and PacBio, provide sequencing-based approaches with which to rigorously define the structure of a genome, and have empowered yeast geneticists to explore this poorly described realm of eukaryotic genomics. Here, we present the comprehensive genomic structural analysis of a wild diploid isolate of S. cerevisiae, YJM311. We used long read sequence analysis to construct a haplotype-phased, telomere-to-telomere length assembly of the YJM311 genome and characterized the structural variations (SVs) therein. We discovered that the genome of YJM311 contains significant intragenomic structural variation, some of which imparts notable consequences to the genomic stability and developmental biology of the strain. Collectively, we outline a new methodology for creating accurate haplotype-phased genome assemblies and highlight how such genomic analyses can define the structural architectures of S. cerevisiae isolates. It is our hope that continued structural characterization of S. cerevisiae genomes, such as we have reported here for YJM311, will comprehensively advance our understanding of eukaryotic genome structure-function relationships, structural genomic diversity, and evolution.


Author(s):  
Wenyan Gu ◽  
Aizhong Zhou ◽  
Lusheng Wang ◽  
Shiwei Sun ◽  
Xuefeng Cui ◽  
...  

2020 ◽  
Author(s):  
Jesper Eisfeldt ◽  
Maria Pettersson ◽  
Anna Petri ◽  
Daniel Nilsson ◽  
Lars Feuk ◽  
...  

AbstractChromoanagenesis is a genomic event responsible for the formation of complex structural chromosomal rearrangements (CCRs). Germline chromoanagenesis is rare and the majority of reported cases are associated with an affected phenotype. Here, we report a healthy female carrying two de novo CCRs involving chromosomes 4, 19, 21 and X and chromosomes 7 and 11, respectively, with a total of 137 breakpoint junctions (BPJs). We characterized the CCRs using a hybrid-sequencing approach, combining short-read sequencing, nanopore sequencing, and optical mapping. The results were validated using multiple cytogenetic methods, including fluorescence in situ hybridization, spectral karyotyping, and Sanger sequencing. We identified 137 BPJs, which to our knowledge is the highest number of reported breakpoint junctions in germline chromoanagenesis. We also performed a statistical assessment of the positioning of the breakpoints, revealing a significant enrichment of BPJ-affecting genes (96 intragenic BPJs, 26 genes, p < 0.0001), indicating that the CCRs formed during active transcription of these genes. In addition, we find that the DNA fragments are unevenly and non-randomly distributed across the derivative chromosomes indicating a multistep process of scattering and re-joining of DNA fragments. In summary, we report a new maximum number of BPJs (137) in germline chromoanagenesis. We also show that a hybrid sequencing approach is necessary for the correct characterization of complex CCRs. Through in-depth statistical assessment, it was found that the CCRs most likely was formed through an event resembling chromoplexy—a catastrophic event caused by erroneous transcription factor binding.


Sign in / Sign up

Export Citation Format

Share Document