scholarly journals Improved contiguity of the threespine stickleback genome using long-read sequencing

2021 ◽  
Vol 11 (2) ◽  
Author(s):  
Shivangi Nath ◽  
Daniel E Shaw ◽  
Michael A White

Abstract While the cost and time for assembling a genome has drastically decreased, it still remains a challenge to assemble a highly contiguous genome. These challenges are rapidly being overcome by the integration of long-read sequencing technologies. Here, we use long-read sequencing to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species. Using Pacific Biosciences sequencing, we assembled a highly contiguous genome of a freshwater fish from Paxton Lake. Using contigs from this genome, we were able to fill over 76.7% of the gaps in the existing reference genome assembly, improving contiguity over fivefold. Our gap filling approach was highly accurate, validated by 10X Genomics long-distance linked-reads. In addition to closing a majority of gaps, we were able to assemble segments of telomeres and centromeres throughout the genome. This highlights the power of using long sequencing reads to assemble highly repetitive and difficult to assemble regions of genomes. This latest genome build has been released through a newly designed community genome browser that aims to consolidate the growing number of genomics datasets available for the threespine stickleback fish.

Author(s):  
Shivangi Nath ◽  
Daniel E. Shaw ◽  
Michael A. White

AbstractWhile the cost and time for assembling a genome have drastically reduced, it still remains a challenge to assemble a highly contiguous genome. These challenges are rapidly being overcome by the integration of long-read sequencing technologies. Here, we use long sequencing reads to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species. Using Pacific Biosciences sequencing, we were able to fill over 76% of the gaps in the genome, improving contiguity over five-fold. Our approach was highly accurate, validated by 10X Genomics long-distance linked-reads. In addition to closing a majority of gaps, we were able to assemble segments of telomeres and centromeres throughout the genome. This highlights the power of using long sequencing reads to assemble highly repetitive and difficult to assemble regions of genomes. This latest genome build has been released through a newly designed community genome browser that aims to consolidate the growing number of genomics datasets available for the threespine stickleback fish.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Edwin A. Solares ◽  
Yuan Tao ◽  
Anthony D. Long ◽  
Brandon S. Gaut

Abstract Background Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding. Results Here we illustrate a new method, which we call HapSolo, that identifies secondary contigs and defines a primary assembly based on multiple pairwise contig alignment metrics. HapSolo evaluates candidate primary assemblies using BUSCO scores and then distinguishes among candidate assemblies using a cost function. The cost function can be defined by the user but by default considers the number of missing, duplicated and single BUSCO genes within the assembly. HapSolo performs hill climbing to minimize cost over thousands of candidate assemblies. We illustrate the performance of HapSolo on genome data from three species: the Chardonnay grape (Vitis vinifera), with a genome of 490 Mb, a mosquito (Anopheles funestus; 200 Mb) and the Thorny Skate (Amblyraja radiata; 2650 Mb). Conclusions HapSolo rapidly identified candidate assemblies that yield improvements in assembly metrics, including decreased genome size and improved N50 scores. Contig N50 scores improved by 35%, 9% and 9% for Chardonnay, mosquito and the thorny skate, respectively, relative to unreduced primary assemblies. The benefits of HapSolo were amplified by down-stream analyses, which we illustrated by scaffolding with Hi-C data. We found, for example, that prior to the application of HapSolo, only 52% of the Chardonnay genome was captured in the largest 19 scaffolds, corresponding to the number of chromosomes. After the application of HapSolo, this value increased to ~ 84%. The improvements for the mosquito’s largest three scaffolds, representing the number of chromosomes, were from 61 to 86%, and the improvement was even more pronounced for thorny skate. We compared the scaffolding results to assemblies that were based on PurgeDups for identifying secondary contigs, with generally superior results for HapSolo.


Author(s):  
Clément Schneider ◽  
Christian Woehle ◽  
Carola Greve ◽  
Cyrille A. D’Haese ◽  
Magnus Wolf ◽  
...  

ABSTRACTGenome sequencing of all known eukaryotes on Earth promises unprecedented advances in evolutionary sciences, ecology, systematics and in biodiversity-related applied fields such as environmental management and natural product research. Advances in DNA sequencing technologies make genome sequencing feasible for many non-genetic model species. However, genome sequencing today relies on large quantities of high quality, high molecular weight (HMW) DNA which is mostly obtained from fresh tissues. This is problematic for biodiversity genomics of Metazoa as most species are small and yield minute amounts of DNA. Furthermore, briging living specimens to the lab bench not realistic for the majority of species.Here we overcome those difficulties by sequencing two species of springtails (Collembola) from single specimens preserved in ethanol. We used a newly developed, genome-wide amplification-based protocol to generate PacBio libraries for HiFi long-read sequencing.The assembled genomes were highly continuous. They can be considered complete as we recovered over 95% of BUSCOs. Genome-wide amplification does not seem to bias genome recovery. Presence of almost complete copies of the mitochondrial genome in the nuclear genome were pitfalls for automatic assemblers. The genomes fit well into an existing phylogeny of springtails. A neotype is designated for one of the species, blending genome sequencing and creation of taxonomic references.Our study shows that it is possible to obtain high quality genomes from small, field-preserved sub-millimeter metazoans, thus making their vast diversity accessible to the fields of genomics.


2017 ◽  
Author(s):  
David Porubsky ◽  
Shilpa Garg ◽  
Ashley D. Sanders ◽  
Jan O. Korbel ◽  
Victor Guryev ◽  
...  

ABSTRACTThe diploid nature of the genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. Many important biological phenomena such as compound heterozygosity and epistatic effects between enhancers and target genes, however, can only be studied when haplotype-resolved genomes are available. This lack of haplotype-level analyses can be explained by a dearth of methods to produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. Our experiments provide comprehensive guidance on favorable combinations of Strand-seq libraries and sequencing coverages to obtain complete and genome-wide haplotypes of a single individual genome (NA12878) at manageable costs. We were able to reliably assign > 95% of alleles to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different sequencing technologies represents an attractive solution to chart the unique genetic variation of diploid genomes.


2021 ◽  
Author(s):  
Louise Aigrain

Since the publication of the first draft of the human genome 20 years ago, several novel sequencing technologies have emerged. Whilst some drive the cost of DNA sequencing down, others address the difficult parts of the genome which remained inaccessible so far. But the next-generation sequencing (NGS) landscape is a fast-changing environment and one can easily get lost between second- and third- generation sequencers, or the pros and cons of short- versus long-read technologies. In this beginner’s guide to NGS, we will review the main NGS technologies available in 2021. We will compare sample preparation protocols and sequencing methods, highlighting the requirements and advantages of each technology.


GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Valentine Murigneux ◽  
Subash Kumar Rai ◽  
Agnelo Furtado ◽  
Timothy J C Bruxner ◽  
Wei Tian ◽  
...  

Abstract Background Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same sample. Results Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. Conclusions The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.


2012 ◽  
Vol 90 (12) ◽  
pp. 1386-1393 ◽  
Author(s):  
Kyrre Grøtan ◽  
Kjartan Østbye ◽  
Annette Taugbøl ◽  
L. Asbjørn Vøllestad

Marine threespine stickleback ( Gasterosteus aculatus L., 1758) have repeatedly colonized Holarctic freshwater environments after the retreat of the Pleistocene glaciers, and based on their ability to move rapidly between salinities have apparently retained a robust osmoregulatory apparatus that can cope with both short- and long-term exposure to non-native salinity environments. Standard metabolic rate (SMR), measured as oxygen consumption at rest, can be used as an indicator of the cost of osmoregulation when fish are exposed to new environmental conditions. Following freshwater colonization, reduction in the number of lateral plates, an antipredator defence structure, is common. Completely plated fish dominate in the sea, low-plated fish dominate in fresh water, and partially plated fish often dominate in brackish water environments. In a laboratory experiment, we estimated SMR in locally adapted populations from salt, brackish, and freshwater environments at three different salinities (0, 15, and 30 practical salinity units (PSU)). In addition, we tested for correlations between SMR and lateral plate number and lateral plate genotype at the Ectodysplasin locus for stickleback originating from the brackish water population. Contrary to our expectations, no differences were found in SMR between any of the experimental groups in our experiment. Apparently, the threespine stickleback is able to move among salinity environments without large short-term metabolic costs, irrespective of their environment of origin.


GigaScience ◽  
2021 ◽  
Vol 10 (5) ◽  
Author(s):  
Clément Schneider ◽  
Christian Woehle ◽  
Carola Greve ◽  
Cyrille A D'Haese ◽  
Magnus Wolf ◽  
...  

Abstract Background Genome sequencing of all known eukaryotes on Earth promises unprecedented advances in biological sciences and in biodiversity-related applied fields such as environmental management and natural product research. Advances in long-read DNA sequencing make it feasible to generate high-quality genomes for many non–genetic model species. However, long-read sequencing today relies on sizable quantities of high-quality, high molecular weight DNA, which is mostly obtained from fresh tissues. This is a challenge for biodiversity genomics of most metazoan species, which are tiny and need to be preserved immediately after collection. Here we present de novo genomes of 2 species of submillimeter Collembola. For each, we prepared the sequencing library from high molecular weight DNA extracted from a single specimen and using a novel ultra-low input protocol from Pacific Biosciences. This protocol requires a DNA input of only 5 ng, permitted by a whole-genome amplification step. Results The 2 assembled genomes have N50 values >5.5 and 8.5 Mb, respectively, and both contain ∼96% of BUSCO genes. Thus, they are highly contiguous and complete. The genomes are supported by an integrative taxonomy approach including placement in a genome-based phylogeny of Collembola and designation of a neotype for 1 of the species. Higher heterozygosity values are recorded in the more mobile species. Both species are devoid of the biosynthetic pathway for β-lactam antibiotics known in several Collembola, confirming the tight correlation of antibiotic synthesis with the species way of life. Conclusions It is now possible to generate high-quality genomes from single specimens of minute, field-preserved metazoans, exceeding the minimum contig N50 (1 Mb) required by the Earth BioGenome Project.


Author(s):  
Catherine L. Peichel ◽  
Shaugnessy R. McCann ◽  
Joseph A. Ross ◽  
Alice F. S. Naftaly ◽  
James R. Urton ◽  
...  

AbstractHeteromorphic sex chromosomes have evolved repeatedly across diverse species. Suppression of recombination between X and Y chromosomes leads to rapid degeneration of the Y chromosome. However, these early stages of degeneration are not well understood, as complete Y chromosome sequence assemblies have only been generated across a handful of taxa with ancient sex chromosomes. Here we describe the assembly of the threespine stickleback (Gasterosteus aculeatus) Y chromosome, which is less than 26 million years old. Our previous work identified that the non-recombining region between the X and the Y spans ∼17.5 Mb on the X chromosome. Here, we combined long-read PacBio sequencing with a Hi-C-based proximity guided assembly to generate a 15.87 Mb assembly of the Y chromosome. Our assembly is concordant with cytogenetic maps and Sanger sequences of over 90 Y chromosome clones from a bacterial artificial chromosome (BAC) library. We found three evolutionary strata on the Y chromosome, consistent with the three inversions identified by our previous cytogenetic analyses. The young threespine stickleback Y shows convergence with older sex chromosomes in the retention of haploinsufficient genes and the accumulation of genes with testis-biased expression, many of which are recent duplicates. However, we found no evidence for large amplicons found in other sex chromosome systems. We also report an excellent candidate for the master sex-determination gene: a translocated copy of Amh (Amhy). Together, our work shows that the same evolutionary forces shaping older sex chromosomes can cause remarkably rapid changes in the overall genetic architecture on young Y chromosomes.


2017 ◽  
Author(s):  
Daniel W. Bellott ◽  
Ting-Jan Cho ◽  
Jennifer F. Hughes ◽  
Helen Skaletsky ◽  
David C. Page

AbstractReference sequence of structurally complex regions can only be obtained through highly accurate clone-based approaches. We and others have successfully employed Single-Haplotype Iterative Mapping and Sequencing (SHIMS 1.0) to assemble structurally complex regions across the sex chromosomes of several vertebrate species and in targeted improvements to the reference sequences of human autosomes. However, SHIMS 1.0 was expensive and time consuming, requiring the resources that only a genome center could command. Here we introduce SHIMS 2.0, an improved SHIMS protocol to allow even a small laboratory to generate high-quality reference sequence from complex genomic regions. Using a streamlined and parallelized library preparation protocol, and taking advantage of high-throughput, inexpensive, short-read sequencing technologies, a small group can sequence and assemble hundreds of clones in a week. Relative to SHIMS 1.0, SHIMS 2.0 reduces the cost and time required by two orders of magnitude, while preserving high sequencing accuracy.


Sign in / Sign up

Export Citation Format

Share Document