An improved genome assembly uncovers prolific tandem repeats in Atlantic cod

Mapping Intimacies ◽

10.1101/060921 ◽

2016 ◽

Cited By ~ 5

Author(s):

Ole K. Tørresen ◽

Bastiaan Star ◽

Sissel Jentoft ◽

William B. Reinar ◽

Harald Grove ◽

...

Keyword(s):

Genome Assembly ◽

Gadus Morhua ◽

Tandem Repeats ◽

Atlantic Cod ◽

Genomic Variation ◽

Promoter Regions ◽

Sequencing Technologies ◽

Combining Data ◽

Genome Assemblies ◽

Multiple Assembly

AbstractBackground: The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.Results: By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21 % of the TRs across the assembly, 19 % in the promoter regions and 12 % in the coding sequences are heterozygous in the sequenced individual.Conclusions: The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies

10.21203/rs.3.rs-712747/v1 ◽

2021 ◽

Author(s):

Arang Rhie ◽

Ann Mc Cartney ◽

Kishwar Shafin ◽

Michael Alonge ◽

Andrey Bzikadze ◽

...

Keyword(s):

Genome Assembly ◽

Tandem Repeats ◽

Hydatidiform Mole ◽

Segmental Duplications ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Human Genome Assembly ◽

Long Read ◽

Genome Assemblies ◽

Oxford Nanopore Technologies

Abstract Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first Telomere-to-Telomere (T2T) human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Though derived from highly accurate sequencing, evaluation revealed that the initial T2T draft assembly had evidence of small errors and structural misassemblies. To correct these errors, we designed a novel repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly QV to 73.9. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both PacBio HiFi and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies

Genomic architecture of codfishes featured by expansions of innate immune genes and short tandem repeats

10.1101/163949 ◽

2017 ◽

Author(s):

Ole K. Tørresen ◽

Marine S. O. Brieuc ◽

Monica H. Solbakken ◽

Elin Sørhus ◽

Alexander J. Nederbragt ◽

...

Keyword(s):

Immune System ◽

Genome Assembly ◽

Short Tandem Repeats ◽

Tandem Repeats ◽

Atlantic Cod ◽

High Density ◽

Innate Immune ◽

Immune Gene ◽

Genome Assemblies ◽

Short Tandem

AbstractBackgroundIncreased availability of genome assemblies for non-model organisms has resulted in invaluable biological and genomic insight into numerous vertebrates including teleosts. The sequencing and assembly of the Atlantic cod (Gadus morhua) genome and the genomes of many of its relatives (Gadiformes) demonstrated a shared loss 100 million years ago of the major histocompatibility complex (MHC) II genes. The recent publication of an improved version of the Atlantic cod genome assembly reported an extreme density of tandem repeats compared to other vertebrate genome assemblies. Highly contiguous genome assemblies are needed to further investigate the unusual immune system of the Gadiformes, and the high density of tandem repeats in this group.ResultsHere, we have sequenced and assembled the genome of haddock (Melanogrammus aeglefinus) - a relative of Atlantic cod - using a combination of PacBio and Illumina reads. Comparative analyses uncover that the haddock genome contains an even higher density of tandem repeats outside and within protein coding sequences than Atlantic cod. Further, both species show an elevated number of tandem repeats in genes mainly involved in signal transduction compared to other teleosts. An in-depth characterization of the immune gene repertoire demonstrates a substantial expansion of MCHI in Atlantic cod compared to haddock. In contrast, the Toll-like receptors show a similar pattern of gene losses and expansions. For another gene family associated with the innate immune system, the NOD-like receptors (NLRs), we find a large expansion common to all teleosts, with possible lineage-specific expansions in zebrafish, stickleback and the codfishes.ConclusionsThe generation of a highly contiguous genome assembly of haddock revealed that the high density of short tandem repeats as well as expanded immune gene families is not unique to Atlantic cod – but most likely a feature common to all codfishes. A shared expansion of NLR genes in teleosts suggests that the NLRs have a more substantial role in the innate immunity of teleosts than other vertebrates. Moreover, we find that high copy number genes combined with variable genome assembly qualities may impede complete characterization, i.e. the number of NLRs might be underestimates in the different teleost species.

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies

10.1101/2021.07.02.450803 ◽

2021 ◽

Author(s):

Ann M Mc Cartney ◽

Kishwar Shafin ◽

Michael Alonge ◽

Andrey V Bzikadze ◽

Giulio Formenti ◽

...

Keyword(s):

Genome Assembly ◽

Tandem Repeats ◽

Hydatidiform Mole ◽

Segmental Duplications ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Human Genome Assembly ◽

Long Read ◽

Genome Assemblies ◽

Oxford Nanopore Technologies

Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first Telomere-to-Telomere (T2T) human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Though derived from highly accurate sequencing, evaluation revealed that the initial T2T draft assembly had evidence of small errors and structural misassemblies. To correct these errors, we designed a novel repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly QV to 73.9. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both PacBio HiFi and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.

Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise

10.1101/2019.12.19.882399 ◽

2019 ◽

Cited By ~ 5

Author(s):

Valentina Peona ◽

Mozes P.K. Blom ◽

Luohao Xu ◽

Reto Burri ◽

Shawn Sullivan ◽

...

Keyword(s):

Dark Matter ◽

Genome Assembly ◽

Sex Chromosome ◽

De Novo ◽

Model Organism ◽

Technology Choice ◽

High Quality ◽

Sequencing Technologies ◽

Downstream Analysis ◽

Genome Assemblies

AbstractGenome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies have opened up a whole new world of genomic biodiversity. Although these technologies generate high-quality genome assemblies, there are still genomic regions difficult to assemble, like repetitive elements and GC-rich regions (genomic “dark matter”). In this study, we compare the efficiency of currently used sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter starting from the same sample. By adopting different de-novo assembly strategies, we were able to compare each individual draft assembly to a curated multiplatform one and identify the nature of the previously missing dark matter with a particular focus on transposable elements, multi-copy MHC genes, and GC-rich regions. Thanks to this multiplatform approach, we demonstrate the feasibility of producing a high-quality chromosome-level assembly for a non-model organism (paradise crow) for which only suboptimal samples are available. Our approach was able to reconstruct complex chromosomes like the repeat-rich W sex chromosome and several GC-rich microchromosomes. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects around the completeness of both the coding and non-coding parts of the genomes.

A Deep Dive into Genome Assemblies of Non-vertebrate Animals

10.20944/preprints202111.0170.v1 ◽

2021 ◽

Author(s):

Nadège Guiglielmoni ◽

Ramón Rivera-Vicéns ◽

Romain Koszul ◽

Jean-François Flot

Keyword(s):

Genome Assembly ◽

Current Knowledge ◽

Genome Structure ◽

Deep Dive ◽

Sequencing Technologies ◽

Current State ◽

Animal Diversity ◽

And Function ◽

Genome Assemblies ◽

Genome Projects

Non-vertebrate species represent about ~95% of known metazoan (animal) diversity. They remain to this day relatively unexplored genetically, but understanding their genome structure and function is pivotal for expanding our current knowledge of evolution, ecology and biodiversity. Following the continuous improvements and decreasing costs of sequencing technologies, many genome assembly tools have been released, leading to a significant amount of genome projects being completed in recent years. In this review, we examine the current state of genome projects of non-vertebrate animal species. We present an overview of available sequencing technologies, assembly approaches, as well as pre and post-processing steps, genome assembly evaluation methods, and their application to non-vertebrate animal genomes.

NucMerge: Genome assembly quality improvement assisted by alternative assemblies and paired-end Illumina reads

10.1101/483701 ◽

2018 ◽

Author(s):

Ksenia Khelik ◽

Alexander Johan Nederbragt ◽

Geir Kjetil Sandve ◽

Torbjørn Rognes

Keyword(s):

Genome Assembly ◽

Second Generation ◽

Sequencing Technologies ◽

Assembly Accuracy ◽

Alternative Approach ◽

Second Generation Sequencing ◽

Genome Assemblies ◽

And Inversion ◽

Generation Sequencing ◽

Genome Projects

AbstractBackgroundIn spite of the major breakthroughs in the second-generation sequencing technologies and the developments of a plethora of assemblers over the last ten years, the resulting genome assemblies may still be fragmented and contain errors. It is typical in genome projects with second-generation reads involved to run multiple assemblers with different parameters and choose the best assembly. However, such an approach is always a trade-off between the strengths and weaknesses of the assemblies. To exploit the advantages of different assemblers, an alternative approach that combines the best parts of several assemblies into one may be applied. The existing tools based on such an approach assist in elongation of assembly fragments and/or improvement of assembly accuracy. Though there has been progress with such a strategy, there is still room for improvement of the existing tools.ResultsWe present NucMerge, a tool for improving genome assembly accuracy by incorporating information derived from an alternative assembly and paired-end Illumina reads from the same genome. The tool corrects insertion, deletion, substitution, and inversion errors and locates different inter- and intra-chromosomal rearrangement errors. NucMerge was compared to two existing alternatives, namely Metassembler and GAM-NGS.ConclusionsThe benchmarking results show that NucMerge has generally better performance than the other tools tested, providing accuracy improvement of more assemblies. NucMerge is freely available at https://github.com/uio-bmi/NucMerge under the MPL license.

A nanopore based chromosome-level assembly representing Atlantic cod from the Celtic Sea

10.1101/852145 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tina Graceline Kirubakaran ◽

Øivind Andersen ◽

Michel Moser ◽

Mariann Arnyasi ◽

Philip McGinnity ◽

...

Keyword(s):

Gadus Morhua ◽

Barents Sea ◽

Atlantic Cod ◽

Population Based ◽

Linkage Groups ◽

Long Read ◽

Study Population ◽

Celtic Sea ◽

Genome Assemblies ◽

Inversion Breakpoints

ABSTRACTCurrently available genome assemblies for Atlantic cod (Gadus morhua) have been constructed using DNA from fish belonging to the Northeast Arctic Cod (NEAC) population; a migratory population feeding in the cold Barents Sea. These assemblies have been crucial for the development of genetic markers which have been used to study population differentiation and adaptive evolution in Atlantic cod, pinpointing four discrete islands of genomic divergence located on linkage groups 1, 2, 7 and 12. In this paper, we present a high-quality reference genome from a male Atlantic cod representing a southern population inhabiting the Celtic sea. Structurally, the genome assembly (gadMor_Celtic) was produced from long-read nanopore data and has a combined contig size of 686 Mb with a N50 of 10 Mb. Integrating contigs with genetic linkage mapping information enabled us to construct 23 chromosome sequences which mapped with high confidence to the latest NEAC population assembly (gadMor3) and allowed us to characterize in detail large chromosomal inversions on linkage groups 1, 2, 7 and 12. In most cases, inversion breakpoints could be located within single nanopore contigs. Our results suggest the presence of inversions in Celtic cod on linkage groups 6, 11 and 21, although these remain to be confirmed. Further, we identified a specific repetitive element that is relatively enriched at predicted centromeric regions. Our gadMor_Celtic assembly provides a resource representing a ‘southern’ cod population which is complementary to the existing ‘northern’ population based genome assemblies and represents the first step towards developing pan-genomic resources for Atlantic cod.

Heteroplasmy of short tandem repeats in mitochondrial DNA of Atlantic cod, Gadus morhua.

Genetics ◽

10.1093/genetics/132.1.211 ◽

1992 ◽

Vol 132 (1) ◽

pp. 211-220 ◽

Cited By ~ 2

Author(s):

E Arnason ◽

D M Rand

Keyword(s):

Mitochondrial Dna ◽

Copy Number ◽

Gadus Morhua ◽

Tandem Repeats ◽

Atlantic Cod ◽

Length Variation ◽

The North ◽

Loop Region ◽

D Loop ◽

Displacement Model

Abstract The mitochondrial DNA of the Atlantic cod (Gadus morhua) contains a tandem array of 40-bp repeats in the D-loop region of the molecule. Variation among molecules in the copy number of these repeats results in mtDNA length variation and heteroplasmy (the presence of more than one form of mtDNA in an individual). In a sample of fish collected from different localities around Iceland and off George's Bank, each individual was heteroplasmic for two or more mtDNAs ranging in repeat copy number from two (common) to six (rare). An earlier report on mtDNA heteroplasmy in sturgeon (Acipenser transmontanus) presented a competitive displacement model for length mutations in mtDNAs containing tandem arrays and the cod data deviate from this model. Depending on the nature of putative secondary structures and the location of D-loop strand termination, additional mechanisms of length mutation may be needed to explain the range of mtDNA length variants maintained in these populations. The balance between genetic drift and mutation in maintaining this length polymorphism is estimated through a hierarchical analysis of diversity of mtDNA length variation in the Iceland samples. Eighty percent of the diversity lies within individuals, 8% among individuals and 12% among localities. An estimate of theta = 2N(eo) mu greater than 1 indicates that this system is characterized by a high mutation rate and is governed primarily by deterministic dynamics. The sequences of repeat arrays from fish collected in Norway, Iceland and George's Bank show no nucleotide variation suggesting that there is very little substructuring to the North Atlantic cod population.

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod

BMC Genomics ◽

10.1186/s12864-016-3448-x ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 77

Author(s):

Ole K. Tørresen ◽

Bastiaan Star ◽

Sissel Jentoft ◽

William B. Reinar ◽

Harald Grove ◽

...

Keyword(s):

Genome Assembly ◽

Tandem Repeats ◽

Atlantic Cod

Estimating catch-at-age by combining data from different sources

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f05-026 ◽

2005 ◽

Vol 62 (6) ◽

pp. 1377-1385 ◽

Cited By ~ 9

Author(s):

David Hirst ◽

Geir Storvik ◽

Magne Aldrin ◽

Sondre Aanes ◽

Ragnar Bang Huseby

Keyword(s):

Gadus Morhua ◽

Ad Hoc ◽

Atlantic Cod ◽

Data Types ◽

Commercial Fish ◽

Bayesian Hierarchical ◽

Length Relationship ◽

Setting Process ◽

Combining Data ◽

Almost All

Estimating the catch-at-age of commercial fish species is an important part of the quota-setting process for many different species and almost all countries with a fishing fleet. Current procedures are usually very time-consuming and somewhat ad hoc, and the estimates have no measure of uncertainty. We previously developed a method for catch-at-age of Norwegian Atlantic cod (Gadus morhua), but this only considered aged fish sampled randomly from random hauls. In most countries, the sampling scheme is not so simple. There are usually a very large number of length-only samples from which the age must be estimated using an agelength relationship, and often some or all of the age samples are collected from data that are first stratified by length. This adds considerably to the difficulties in the estimation. In this paper, we model the three different kinds of data simultaneously using a development of our earlier Bayesian hierarchical model. This enables us to obtain estimates of the catch-at-age with appropriate uncertainty and also to provide advice on how best to sample data in the future. The data types are random samples of age, length, and weight; age and weight stratified by length; and length only.