Improved contiguity of the threespine stickleback genome using long-read sequencing

AbstractWhile the cost and time for assembling a genome have drastically reduced, it still remains a challenge to assemble a highly contiguous genome. These challenges are rapidly being overcome by the integration of long-read sequencing technologies. Here, we use long sequencing reads to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species. Using Pacific Biosciences sequencing, we were able to fill over 76% of the gaps in the genome, improving contiguity over five-fold. Our approach was highly accurate, validated by 10X Genomics long-distance linked-reads. In addition to closing a majority of gaps, we were able to assemble segments of telomeres and centromeres throughout the genome. This highlights the power of using long sequencing reads to assemble highly repetitive and difficult to assemble regions of genomes. This latest genome build has been released through a newly designed community genome browser that aims to consolidate the growing number of genomics datasets available for the threespine stickleback fish.

Download Full-text

Improved contiguity of the threespine stickleback genome using long-read sequencing

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab007 ◽

2021 ◽

Vol 11 (2) ◽

Author(s):

Shivangi Nath ◽

Daniel E Shaw ◽

Michael A White

Keyword(s):

Gasterosteus Aculeatus ◽

Genetic Model ◽

Threespine Stickleback ◽

Long Distance ◽

Sequencing Technologies ◽

Reference Genome Assembly ◽

A Genome ◽

Long Read ◽

The Cost ◽

Stickleback Genome

Abstract While the cost and time for assembling a genome has drastically decreased, it still remains a challenge to assemble a highly contiguous genome. These challenges are rapidly being overcome by the integration of long-read sequencing technologies. Here, we use long-read sequencing to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species. Using Pacific Biosciences sequencing, we assembled a highly contiguous genome of a freshwater fish from Paxton Lake. Using contigs from this genome, we were able to fill over 76.7% of the gaps in the existing reference genome assembly, improving contiguity over fivefold. Our gap filling approach was highly accurate, validated by 10X Genomics long-distance linked-reads. In addition to closing a majority of gaps, we were able to assemble segments of telomeres and centromeres throughout the genome. This highlights the power of using long sequencing reads to assemble highly repetitive and difficult to assemble regions of genomes. This latest genome build has been released through a newly designed community genome browser that aims to consolidate the growing number of genomics datasets available for the threespine stickleback fish.

Download Full-text

HapSolo: an optimization approach for removing secondary haplotigs during diploid genome assembly and scaffolding

BMC Bioinformatics ◽

10.1186/s12859-020-03939-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Edwin A. Solares ◽

Yuan Tao ◽

Anthony D. Long ◽

Brandon S. Gaut

Keyword(s):

Cost Function ◽

Anopheles Funestus ◽

Hill Climbing ◽

Optimization Approach ◽

Sequencing Technology ◽

Genome Data ◽

A Genome ◽

Long Read ◽

Downstream Analysis ◽

The Cost

Abstract Background Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding. Results Here we illustrate a new method, which we call HapSolo, that identifies secondary contigs and defines a primary assembly based on multiple pairwise contig alignment metrics. HapSolo evaluates candidate primary assemblies using BUSCO scores and then distinguishes among candidate assemblies using a cost function. The cost function can be defined by the user but by default considers the number of missing, duplicated and single BUSCO genes within the assembly. HapSolo performs hill climbing to minimize cost over thousands of candidate assemblies. We illustrate the performance of HapSolo on genome data from three species: the Chardonnay grape (Vitis vinifera), with a genome of 490 Mb, a mosquito (Anopheles funestus; 200 Mb) and the Thorny Skate (Amblyraja radiata; 2650 Mb). Conclusions HapSolo rapidly identified candidate assemblies that yield improvements in assembly metrics, including decreased genome size and improved N50 scores. Contig N50 scores improved by 35%, 9% and 9% for Chardonnay, mosquito and the thorny skate, respectively, relative to unreduced primary assemblies. The benefits of HapSolo were amplified by down-stream analyses, which we illustrated by scaffolding with Hi-C data. We found, for example, that prior to the application of HapSolo, only 52% of the Chardonnay genome was captured in the largest 19 scaffolds, corresponding to the number of chromosomes. After the application of HapSolo, this value increased to ~ 84%. The improvements for the mosquito’s largest three scaffolds, representing the number of chromosomes, were from 61 to 86%, and the improvement was even more pronounced for thorny skate. We compared the scaffolding results to assemblies that were based on PurgeDups for identifying secondary contigs, with generally superior results for HapSolo.

Download Full-text

Towards population genomics in non-model species with large genomes: a case study of the marine zooplankton Calanus finmarchicus

Royal Society Open Science ◽

10.1098/rsos.180608 ◽

2019 ◽

Vol 6 (2) ◽

pp. 180608 ◽

Cited By ~ 11

Author(s):

Marvin Choquet ◽

Irina Smolina ◽

Anusha K. S. Dhanasiri ◽

Leocadio Blanco-Bercial ◽

Martina Kopp ◽

...

Keyword(s):

Population Genomics ◽

Single Copy ◽

Calanus Finmarchicus ◽

Model Species ◽

Reduced Representation ◽

Marine Copepod ◽

Marine Zooplankton ◽

Sequencing Technologies ◽

A Genome ◽

Large Genomes

Advances in next-generation sequencing technologies and the development of genome-reduced representation protocols have opened the way to genome-wide population studies in non-model species. However, species with large genomes remain challenging, hampering the development of genomic resources for a number of taxa including marine arthropods. Here, we developed a genome-reduced representation method for the ecologically important marine copepod Calanus finmarchicus (haploid genome size of 6.34 Gbp). We optimized a capture enrichment-based protocol based on 2656 single-copy genes, yielding a total of 154 087 high-quality SNPs in C. finmarchicus including 62 372 in common among the three locations tested. The set of capture probes was also successfully applied to the congeneric C. glacialis . Preliminary analyses of these markers revealed similar levels of genetic diversity between the two Calanus species, while populations of C. glacialis showed stronger genetic structure compared to C. finmarchicus . Using this powerful set of markers, we did not detect any evidence of hybridization between C. finmarchicus and C. glacialis . Finally, we propose a shortened version of our protocol, offering a promising solution for population genomics studies in non-model species with large genomes.

Download Full-text

Biodiversity genomics of small metazoans: high quality de novo genomes from single specimens of field-collected and ethanol-preserved springtails

10.1101/2020.08.10.244541 ◽

2020 ◽

Cited By ~ 2

Author(s):

Clément Schneider ◽

Christian Woehle ◽

Carola Greve ◽

Cyrille A. D’Haese ◽

Magnus Wolf ◽

...

Keyword(s):

Genome Sequencing ◽

De Novo ◽

Genetic Model ◽

Nuclear Genome ◽

High Quality ◽

Sequencing Technologies ◽

Natural Product Research ◽

Genome Wide ◽

Long Read ◽

Hmw Dna

ABSTRACTGenome sequencing of all known eukaryotes on Earth promises unprecedented advances in evolutionary sciences, ecology, systematics and in biodiversity-related applied fields such as environmental management and natural product research. Advances in DNA sequencing technologies make genome sequencing feasible for many non-genetic model species. However, genome sequencing today relies on large quantities of high quality, high molecular weight (HMW) DNA which is mostly obtained from fresh tissues. This is problematic for biodiversity genomics of Metazoa as most species are small and yield minute amounts of DNA. Furthermore, briging living specimens to the lab bench not realistic for the majority of species.Here we overcome those difficulties by sequencing two species of springtails (Collembola) from single specimens preserved in ethanol. We used a newly developed, genome-wide amplification-based protocol to generate PacBio libraries for HiFi long-read sequencing.The assembled genomes were highly continuous. They can be considered complete as we recovered over 95% of BUSCOs. Genome-wide amplification does not seem to bias genome recovery. Presence of almost complete copies of the mitochondrial genome in the nuclear genome were pitfalls for automatic assemblers. The genomes fit well into an existing phylogeny of springtails. A neotype is designated for one of the species, blending genome sequencing and creation of taxonomic references.Our study shows that it is possible to obtain high quality genomes from small, field-preserved sub-millimeter metazoans, thus making their vast diversity accessible to the fields of genomics.

Download Full-text

Developing informative microsatellite markers for non-model species using reference mapping against a model species’ genome

Scientific Reports ◽

10.1038/srep23087 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 5

Author(s):

Chih-Ming Hung ◽

Ai-Yun Yu ◽

Yu-Ting Lai ◽

Pei-Jen L. Shaner

Keyword(s):

Microsatellite Markers ◽

Rodent Species ◽

Background Information ◽

Model Species ◽

Breeding Programs ◽

Sequencing Technologies ◽

A Genome ◽

Wide Range ◽

Genomic Locations ◽

Traditional Approaches

Abstract Microsatellites have a wide range of applications from behavioral biology, evolution, to agriculture-based breeding programs. The recent progress in the next-generation sequencing technologies and the rapidly increasing number of published genomes may greatly enhance the current applications of microsatellites by turning them from anonymous to informative markers. Here we developed an approach to anchor microsatellite markers of any target species in a genome of a related model species, through which the genomic locations of the markers, along with any functional genes potentially linked to them, can be revealed. We mapped the shotgun sequence reads of a non-model rodent species Apodemus semotus against the genome of a model species, Mus musculus, and presented 24 polymorphic microsatellite markers with detailed background information for A. semotus in this study. The developed markers can be used in other rodent species, especially those that are closely related to A. semotus or M. musculus. Compared to the traditional approaches based on DNA cloning, our approach is likely to yield more loci for the same cost. This study is a timely demonstration of how a research team can efficiently generate informative (neutral or function-associated) microsatellite markers for their study species and unique biological questions.

Download Full-text

Dense and accurate whole-chromosome haplotyping of individual genomes

10.1101/126136 ◽

2017 ◽

Cited By ~ 1

Author(s):

David Porubsky ◽

Shilpa Garg ◽

Ashley D. Sanders ◽

Jan O. Korbel ◽

Victor Guryev ◽

...

Keyword(s):

Target Genes ◽

Chromosome Length ◽

Single Individual ◽

Sequencing Data ◽

Individual Genome ◽

Sequencing Technologies ◽

Biological Phenomena ◽

Genome Wide ◽

A Genome ◽

Long Read

ABSTRACTThe diploid nature of the genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. Many important biological phenomena such as compound heterozygosity and epistatic effects between enhancers and target genes, however, can only be studied when haplotype-resolved genomes are available. This lack of haplotype-level analyses can be explained by a dearth of methods to produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. Our experiments provide comprehensive guidance on favorable combinations of Strand-seq libraries and sequencing coverages to obtain complete and genome-wide haplotypes of a single individual genome (NA12878) at manageable costs. We were able to reliably assign > 95% of alleles to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different sequencing technologies represents an attractive solution to chart the unique genetic variation of diploid genomes.

Download Full-text

Beginner’s guide to next-generation sequencing

The Biochemist ◽

10.1042/bio_2021_135 ◽

2021 ◽

Author(s):

Louise Aigrain

Keyword(s):

Next Generation Sequencing ◽

Sample Preparation ◽

Third Generation ◽

Next Generation ◽

Sequencing Technologies ◽

Long Read ◽

Pros And Cons ◽

The Cost ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Since the publication of the first draft of the human genome 20 years ago, several novel sequencing technologies have emerged. Whilst some drive the cost of DNA sequencing down, others address the difficult parts of the genome which remained inaccessible so far. But the next-generation sequencing (NGS) landscape is a fast-changing environment and one can easily get lost between second- and third- generation sequencers, or the pros and cons of short- versus long-read technologies. In this beginner’s guide to NGS, we will review the main NGS technologies available in 2021. We will compare sample preparation protocols and sequencing methods, highlighting the requirements and advantages of each technology.

Download Full-text

Comparison of long-read methods for sequencing and assembly of a plant genome

GigaScience ◽

10.1093/gigascience/giaa146 ◽

2020 ◽

Vol 9 (12) ◽

Author(s):

Valentine Murigneux ◽

Subash Kumar Rai ◽

Agnelo Furtado ◽

Timothy J C Bruxner ◽

Wei Tian ◽

...

Keyword(s):

De Novo ◽

Cost Effective ◽

Genome Project ◽

Plant Genome ◽

Sequencing Data ◽

Pacific Biosciences ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Long Read ◽

The Cost

Abstract Background Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same sample. Results Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. Conclusions The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.

Download Full-text

No short-term effect of salinity on oxygen consumption in threespine stickleback (Gasterosteus aculeatus) from fresh, brackish, and salt water

Canadian Journal of Zoology ◽

10.1139/cjz-2012-0121 ◽

2012 ◽

Vol 90 (12) ◽

pp. 1386-1393 ◽

Cited By ~ 17

Author(s):

Kyrre Grøtan ◽

Kjartan Østbye ◽

Annette Taugbøl ◽

L. Asbjørn Vøllestad

Keyword(s):

Oxygen Consumption ◽

Brackish Water ◽

Gasterosteus Aculeatus ◽

Salt Water ◽

Standard Metabolic Rate ◽

Threespine Stickleback ◽

Short Term ◽

Lateral Plate ◽

The Cost ◽

Freshwater Environments

Marine threespine stickleback ( Gasterosteus aculatus L., 1758) have repeatedly colonized Holarctic freshwater environments after the retreat of the Pleistocene glaciers, and based on their ability to move rapidly between salinities have apparently retained a robust osmoregulatory apparatus that can cope with both short- and long-term exposure to non-native salinity environments. Standard metabolic rate (SMR), measured as oxygen consumption at rest, can be used as an indicator of the cost of osmoregulation when fish are exposed to new environmental conditions. Following freshwater colonization, reduction in the number of lateral plates, an antipredator defence structure, is common. Completely plated fish dominate in the sea, low-plated fish dominate in fresh water, and partially plated fish often dominate in brackish water environments. In a laboratory experiment, we estimated SMR in locally adapted populations from salt, brackish, and freshwater environments at three different salinities (0, 15, and 30 practical salinity units (PSU)). In addition, we tested for correlations between SMR and lateral plate number and lateral plate genotype at the Ectodysplasin locus for stickleback originating from the brackish water population. Contrary to our expectations, no differences were found in SMR between any of the experimental groups in our experiment. Apparently, the threespine stickleback is able to move among salinity environments without large short-term metabolic costs, irrespective of their environment of origin.

Download Full-text

Two high-quality de novo genomes from single ethanol-preserved specimens of tiny metazoans (Collembola)

GigaScience ◽

10.1093/gigascience/giab035 ◽

2021 ◽

Vol 10 (5) ◽

Cited By ~ 1

Author(s):

Clément Schneider ◽

Christian Woehle ◽

Carola Greve ◽

Cyrille A D'Haese ◽

Magnus Wolf ◽

...

Keyword(s):

Molecular Weight ◽

High Molecular Weight ◽

De Novo ◽

Genetic Model ◽

High Quality ◽

High Molecular Weight Dna ◽

Mobile Species ◽

Natural Product Research ◽

A Genome ◽

Long Read

Abstract Background Genome sequencing of all known eukaryotes on Earth promises unprecedented advances in biological sciences and in biodiversity-related applied fields such as environmental management and natural product research. Advances in long-read DNA sequencing make it feasible to generate high-quality genomes for many non–genetic model species. However, long-read sequencing today relies on sizable quantities of high-quality, high molecular weight DNA, which is mostly obtained from fresh tissues. This is a challenge for biodiversity genomics of most metazoan species, which are tiny and need to be preserved immediately after collection. Here we present de novo genomes of 2 species of submillimeter Collembola. For each, we prepared the sequencing library from high molecular weight DNA extracted from a single specimen and using a novel ultra-low input protocol from Pacific Biosciences. This protocol requires a DNA input of only 5 ng, permitted by a whole-genome amplification step. Results The 2 assembled genomes have N50 values >5.5 and 8.5 Mb, respectively, and both contain ∼96% of BUSCO genes. Thus, they are highly contiguous and complete. The genomes are supported by an integrative taxonomy approach including placement in a genome-based phylogeny of Collembola and designation of a neotype for 1 of the species. Higher heterozygosity values are recorded in the more mobile species. Both species are devoid of the biosynthetic pathway for β-lactam antibiotics known in several Collembola, confirming the tight correlation of antibiotic synthesis with the species way of life. Conclusions It is now possible to generate high-quality genomes from single specimens of minute, field-preserved metazoans, exceeding the minimum contig N50 (1 Mb) required by the Earth BioGenome Project.

Download Full-text