scholarly journals Whole genome sequencing and assembly of a Caenorhabditis elegans genome with complex genomic rearrangements using the MinION sequencing device

2017 ◽  
Author(s):  
JR Tyson ◽  
NJ O’Neil ◽  
M Jain ◽  
HE Olsen ◽  
P Hieter ◽  
...  

ABSTRACTAdvances in 3rd generation sequencing have opened new possibilities for ‘benchtop’ whole genome sequencing. The MinION is a portable device that uses nanopore technology and can sequence long DNA molecules. MinION long reads are well suited for sequencing and de novo assembly of complex genomes with large repetitive elements. Long reads also facilitate the identification of complex genomic rearrangements such as those observed in tumor genomes. To assess the feasibility of the de novo assembly of large complex genomes using both MinION and Illumina platforms, we sequenced the genome of a Caenorhabditis elegans strain that contains a complex acetaldehyde-induced rearrangement and a biolistic bombardment-mediated insertion of a GFP containing plasmid. Using ∼5.8 gigabases of MinION sequence data, we were able to assemble a C. elegans genome containing 145 contigs (N50 contig length = 1.22 Mb) that covered >99% of the 100,286,401 bp reference genome. In contrast, using ∼8.04 gigabases of Illumina sequence data, we were able to assemble a C. elegans genome in 38,645 contigs (N50 contig length = ∼26 kb) containing 117 Mb. From the MinION genome assembly we identified the complex structures of both the acetaldehyde-induced mutation and the biolistic-mediated insertion. To date, this is the largest genome to be assembled exclusively from MinION data and is the first demonstration that the long reads of MinION sequencing can be used for whole genome assembly of large (100 Mb) genomes and the elucidation of complex genomic rearrangements.

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4588 ◽  
Author(s):  
Märt Roosaare ◽  
Mikk Puustusmaa ◽  
Märt Möls ◽  
Mihkel Vaher ◽  
Maido Remm

BackgroundPlasmids play an important role in the dissemination of antibiotic resistance, making their detection an important task. Using whole genome sequencing (WGS), it is possible to capture both bacterial and plasmid sequence data, but short read lengths make plasmid detection a complex problem.ResultsWe developed a tool named PlasmidSeeker that enables the detection of plasmids from bacterial WGS data without read assembly. The PlasmidSeeker algorithm is based onk-mers and usesk-mer abundance to distinguish between plasmid and bacterial sequences. We tested the performance of PlasmidSeeker on a set of simulated and real bacterial WGS samples, resulting in 100% sensitivity and 99.98% specificity.ConclusionPlasmidSeeker enables quick detection of known plasmids and complements existing tools that assemble plasmids de novo. The PlasmidSeeker source code is stored on GitHub:https://github.com/bioinfo-ut/PlasmidSeeker.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tatiana Maroilley ◽  
Xiao Li ◽  
Matthew Oldach ◽  
Francesca Jean ◽  
Susan J. Stasiuk ◽  
...  

AbstractGenomic rearrangements cause congenital disorders, cancer, and complex diseases in human. Yet, they are still understudied in rare diseases because their detection is challenging, despite the advent of whole genome sequencing (WGS) technologies. Short-read (srWGS) and long-read WGS approaches are regularly compared, and the latter is commonly recommended in studies focusing on genomic rearrangements. However, srWGS is currently the most economical, accurate, and widely supported technology. In Caenorhabditis elegans (C. elegans), such variants, induced by various mutagenesis processes, have been used for decades to balance large genomic regions by preventing chromosomal crossover events and allowing the maintenance of lethal mutations. Interestingly, those chromosomal rearrangements have rarely been characterized on a molecular level. To evaluate the ability of srWGS to detect various types of complex genomic rearrangements, we sequenced three balancer strains using short-read Illumina technology. As we experimentally validated the breakpoints uncovered by srWGS, we showed that, by combining several types of analyses, srWGS enables the detection of a reciprocal translocation (eT1), a free duplication (sDp3), a large deletion (sC4), and chromoanagenesis events. Thus, applying srWGS to decipher real complex genomic rearrangements in model organisms may help designing efficient bioinformatics pipelines with systematic detection of complex rearrangements in human genomes.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8914 ◽  
Author(s):  
Wen Wang ◽  
Fang Wang ◽  
Rongkai Hao ◽  
Aizhen Wang ◽  
Kirill Sharshov ◽  
...  

Background The bar-headed goose (Anser indicus) mainly inhabits the plateau wetlands of Asia. As a specialized high-altitude species, bar-headed geese can migrate between South and Central Asia and annually fly twice over the Himalayan mountains along the central Asian flyway. The physiological, biochemical and behavioral adaptations of bar-headed geese to high-altitude living and flying have raised much interest. However, to date, there is still no genome assembly information publicly available for bar-headed geese. Methods In this study, we present the first de novo whole genome sequencing and assembly of the bar-headed goose, along with gene prediction and annotation. Results 10X Genomics sequencing produced a total of 124 Gb sequencing data, which can cover the estimated genome size of bar-headed goose for 103 times (average coverage). The genome assembly comprised 10,528 scaffolds, with a total length of 1.143 Gb and a scaffold N50 of 10.09 Mb. Annotation of the bar-headed goose genome assembly identified a total of 102 Mb (8.9%) of repetitive sequences, 16,428 protein-coding genes, and 282 tRNAs. In total, we determined that there were 63 expanded and 20 contracted gene families in the bar-headed goose compared with the other 15 vertebrates. We also performed a positive selection analysis between the bar-headed goose and the closely related low-altitude goose, swan goose (Anser cygnoides), to uncover its genetic adaptations to the Qinghai-Tibetan Plateau. Conclusion We reported the currently most complete genome sequence of the bar-headed goose. Our assembly will provide a valuable resource to enhance further studies of the gene functions of bar-headed goose. The data will also be valuable for facilitating studies of the evolution, population genetics and high-altitude adaptations of the bar-headed geese at the genomic level.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Brent S. Pedersen ◽  
Joe M. Brown ◽  
Harriet Dashnow ◽  
Amelia D. Wallace ◽  
Matt Velinder ◽  
...  

AbstractIn studies of families with rare disease, it is common to screen for de novo mutations, as well as recessive or dominant variants that explain the phenotype. However, the filtering strategies and software used to prioritize high-confidence variants vary from study to study. In an effort to establish recommendations for rare disease research, we explore effective guidelines for variant (SNP and INDEL) filtering and report the expected number of candidates for de novo dominant, recessive, and autosomal dominant modes of inheritance. We derived these guidelines using two large family-based cohorts that underwent whole-genome sequencing, as well as two family cohorts with whole-exome sequencing. The filters are applied to common attributes, including genotype-quality, sequencing depth, allele balance, and population allele frequency. The resulting guidelines yield ~10 candidate SNP and INDEL variants per exome, and 18 per genome for recessive and de novo dominant modes of inheritance, with substantially more candidates for autosomal dominant inheritance. For family-based, whole-genome sequencing studies, this number includes an average of three de novo, ten compound heterozygous, one autosomal recessive, four X-linked variants, and roughly 100 candidate variants following autosomal dominant inheritance. The slivar software we developed to establish and rapidly apply these filters to VCF files is available at https://github.com/brentp/slivar under an MIT license, and includes documentation and recommendations for best practices for rare disease analysis.


Author(s):  
Seyoung Mun ◽  
Songmi Kim ◽  
Wooseok Lee ◽  
Keunsoo Kang ◽  
Thomas J. Meyer ◽  
...  

AbstractAdvances in next-generation sequencing (NGS) technology have made personal genome sequencing possible, and indeed, many individual human genomes have now been sequenced. Comparisons of these individual genomes have revealed substantial genomic differences between human populations as well as between individuals from closely related ethnic groups. Transposable elements (TEs) are known to be one of the major sources of these variations and act through various mechanisms, including de novo insertion, insertion-mediated deletion, and TE–TE recombination-mediated deletion. In this study, we carried out de novo whole-genome sequencing of one Korean individual (KPGP9) via multiple insert-size libraries. The de novo whole-genome assembly resulted in 31,305 scaffolds with a scaffold N50 size of 13.23 Mb. Furthermore, through computational data analysis and experimental verification, we revealed that 182 TE-associated structural variation (TASV) insertions and 89 TASV deletions contributed 64,232 bp in sequence gain and 82,772 bp in sequence loss, respectively, in the KPGP9 genome relative to the hg19 reference genome. We also verified structural differences associated with TASVs by comparative analysis with TASVs in recent genomes (AK1 and TCGA genomes) and reported their details. Here, we constructed a new Korean de novo whole-genome assembly and provide the first study, to our knowledge, focused on the identification of TASVs in an individual Korean genome. Our findings again highlight the role of TEs as a major driver of structural variations in human individual genomes.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ho-Yon Hwang ◽  
Jiou Wang

AbstractGenetic mapping is used in forward genetics to narrow the list of candidate mutations and genes corresponding to the mutant phenotype of interest. Even with modern advances in biology such as efficient identification of candidate mutations by whole-genome sequencing, mapping remains critical in pinpointing the responsible mutation. Here we describe a simple, fast, and affordable mapping toolkit that is particularly suitable for mapping in Caenorhabditis elegans. This mapping method uses insertion-deletion polymorphisms or indels that could be easily detected instead of single nucleotide polymorphisms in commonly used Hawaiian CB4856 mapping strain. The materials and methods were optimized so that mapping could be performed using tiny amount of genetic material without growing many large populations of mutants for DNA purification. We performed mapping of previously known and unknown mutations to show strengths and weaknesses of this method and to present examples of completed mapping. For situations where Hawaiian CB4856 is unsuitable, we provide an annotated list of indels as a basis for fast and easy mapping using other wild isolates. Finally, we provide rationale for using this mapping method over other alternatives as a part of a comprehensive strategy also involving whole-genome sequencing and other methods.


Author(s):  
Guangtu Gao ◽  
Susana Magadan ◽  
Geoffrey C Waldbieser ◽  
Ramey C Youngblood ◽  
Paul A Wheeler ◽  
...  

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.


Sign in / Sign up

Export Citation Format

Share Document