Reconstruction of proto-vertebrate, proto-cyclostome and proto-gnathostome genomes provides new insights into early vertebrate evolution

AbstractAncient polyploidization events have had a lasting impact on vertebrate genome structure, organization and function. Some key questions regarding the number of ancient polyploidization events and their timing in relation to the cyclostome-gnathostome divergence have remained contentious. Here we generate de novo long-read-based chromosome-scale genome assemblies for the Japanese lamprey and elephant shark. Using these and other representative genomes and developing algorithms for the probabilistic macrosynteny model, we reconstruct high-resolution proto-vertebrate, proto-cyclostome and proto-gnathostome genomes. Our reconstructions resolve key questions regarding the early evolutionary history of vertebrates. First, cyclostomes diverged from the lineage leading to gnathostomes after a shared tetraploidization (1R) but before a gnathostome-specific tetraploidization (2R). Second, the cyclostome lineage experienced an additional hexaploidization. Third, 2R in the gnathostome lineage was an allotetraploidization event, and biased gene loss from one of the subgenomes shaped the gnathostome genome by giving rise to remarkably conserved microchromosomes. Thus, our reconstructions reveal the major evolutionary events and offer new insights into the origin and evolution of vertebrate genomes.

Download Full-text

Evolution of a chordate-specific mechanism for myoblast fusion

10.1101/2021.07.24.453587 ◽

2021 ◽

Author(s):

Haifeng Zhang ◽

Renjie Shang ◽

Kwantae Kim ◽

Wei Zheng ◽

Christopher J. Johnson ◽

...

Keyword(s):

Evolutionary History ◽

De Novo ◽

Myoblast Fusion ◽

Last Common Ancestor ◽

Functional Tests ◽

Evolutionary Origins ◽

New Genes ◽

History Of ◽

Early Vertebrates ◽

And Function

The size of an animal is determined by the size of its musculoskeletal system. Myoblast fusion is an innovative mechanism that allows for multinucleated muscle fibers to compound the size and strength of individual mononucleated cells. However, the evolutionary history of the control mechanism underlying this important process is currently unknown. The phylum Chordata hosts closely related groups that span distinct myoblast fusion states: no fusion in cephalochordates, restricted fusion and multinucleation in tunicates, and extensive, obligatory fusion in vertebrates. To elucidate how these differences may have evolved, we studied the evolutionary origins and function of membrane-coalescing agents Myomaker and Myomixer in various groups of chordates. Here we report that Myomaker likely arose through gene duplication in the last common ancestor of tunicates and vertebrates, while Myomixer appears to have evolved de novo in early vertebrates. Functional tests revealed an unexpectedly complex evolutionary history of myoblast fusion in chordates. A pre-vertebrate phase of muscle multinucleation driven by Myomaker was followed by the later emergence of Myomixer that enables the highly efficient fusion system of vertebrates. Thus, our findings reveal the evolutionary origins of chordate-specific fusogens and illustrate how new genes can shape the emergence of novel morphogenetic traits and mechanisms.

Download Full-text

Contiguity: Contig adjacency graph construction and visualisation

10.7287/peerj.preprints.1037v1 ◽

2015 ◽

Cited By ~ 8

Author(s):

Mitchell J Sullivan ◽

Nouri L Ben Zakour ◽

Brian M Forde ◽

Mitchell Stanton-Cook ◽

Scott A Beatson

Keyword(s):

De Novo ◽

Reference Sequence ◽

De Bruijn Graph ◽

Interactive Software ◽

Graph Exploration ◽

Adjacency Graph ◽

Highly Sensitive ◽

Long Read ◽

Genome Assemblies ◽

Adjacency Graphs

Contiguity is an interactive software for the visualization and manipulation of de novo genome assemblies. Contiguity creates and displays information on contig adjacency which is contextualized by the simultaneous display of a comparison between assembled contigs and reference sequence. Where scaffolders allow unambiguous connections between contigs to be resolved into a single scaffold, Contiguity allows the user to create all potential scaffolds in ambiguous regions of the genome. This enables the resolution of novel sequence or structural variants from the assembly. In addition, Contiguity provides a sequencing and assembly agnostic approach for the creation of contig adjacency graphs. To maximize the number of contig adjacencies determined, Contiguity combines information from read pair mappings, sequence overlap and De Bruijn graph exploration. We demonstrate how highly sensitive graphs can be achieved using this method. Contig adjacency graphs allow the user to visualize potential arrangements of contigs in unresolvable areas of the genome. By combining adjacency information with comparative genomics, Contiguity provides an intuitive approach for exploring and improving sequence assemblies. It is also useful in guiding manual closure of long read sequence assemblies. Contiguity is an open source application, implemented using Python and the Tkinter GUI package that can run on any Unix, OSX and Windows operating system. It has been designed and optimized for bacterial assemblies. Contiguity is available at http://mjsull.github.io/Contiguity .

Download Full-text

A Deep Dive into Genome Assemblies of Non-vertebrate Animals

10.20944/preprints202111.0170.v1 ◽

2021 ◽

Author(s):

Nadège Guiglielmoni ◽

Ramón Rivera-Vicéns ◽

Romain Koszul ◽

Jean-François Flot

Keyword(s):

Genome Assembly ◽

Current Knowledge ◽

Genome Structure ◽

Deep Dive ◽

Sequencing Technologies ◽

Current State ◽

Animal Diversity ◽

And Function ◽

Genome Assemblies ◽

Genome Projects

Non-vertebrate species represent about ~95% of known metazoan (animal) diversity. They remain to this day relatively unexplored genetically, but understanding their genome structure and function is pivotal for expanding our current knowledge of evolution, ecology and biodiversity. Following the continuous improvements and decreasing costs of sequencing technologies, many genome assembly tools have been released, leading to a significant amount of genome projects being completed in recent years. In this review, we examine the current state of genome projects of non-vertebrate animal species. We present an overview of available sequencing technologies, assembly approaches, as well as pre and post-processing steps, genome assembly evaluation methods, and their application to non-vertebrate animal genomes.

Download Full-text

Genomic insights into the origin, domestication and diversification of Brassica juncea

Nature Genetics ◽

10.1038/s41588-021-00922-y ◽

2021 ◽

Vol 53 (9) ◽

pp. 1392-1402

Author(s):

Lei Kang ◽

Lunwen Qian ◽

Ming Zheng ◽

Liyang Chen ◽

Hao Chen ◽

...

Keyword(s):

Brassica Juncea ◽

De Novo ◽

Gene Mutations ◽

Sequencing Analysis ◽

Genome Wide ◽

Long Read ◽

History Of ◽

Crop Types ◽

Allotetraploid Species ◽

New Crop

AbstractDespite early domestication around 3000 BC, the evolutionary history of the ancient allotetraploid species Brassica juncea (L.) Czern & Coss remains uncertain. Here, we report a chromosome-scale de novo assembly of a yellow-seeded B. juncea genome by integrating long-read and short-read sequencing, optical mapping and Hi-C technologies. Nuclear and organelle phylogenies of 480 accessions worldwide supported that B. juncea is most likely a single origin in West Asia, 8,000–14,000 years ago, via natural interspecific hybridization. Subsequently, new crop types evolved through spontaneous gene mutations and introgressions along three independent routes of eastward expansion. Selective sweeps, genome-wide trait associations and tissue-specific RNA-sequencing analysis shed light on the domestication history of flowering time and seed weight, and on human selection for morphological diversification in this versatile species. Our data provide a comprehensive insight into the origin and domestication and a foundation for genomics-based breeding of B. juncea.

Download Full-text

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system

GigaScience ◽

10.1093/gigascience/giz122 ◽

2019 ◽

Vol 8 (10) ◽

Cited By ~ 12

Author(s):

Sarah B Kingan ◽

Julie Urban ◽

Christine C Lambert ◽

Primo Baybayan ◽

Anna K Childers ◽

...

Keyword(s):

Invasive Species ◽

Genome Assembly ◽

De Novo ◽

Fragment Size ◽

High Quality ◽

De Novo Genome Assembly ◽

Lycorma Delicatula ◽

Long Read ◽

Genome Assemblies ◽

High Quality Genome

ABSTRACT Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.

Download Full-text

High contiguity long read assembly of Brassica nigra allows localization of active centromeres and provides insights into the ancestral Brassica genome

10.1101/2020.02.03.932665 ◽

2020 ◽

Cited By ~ 5

Author(s):

Sampath Perumal ◽

Chu Shin Koh ◽

Lingling Jin ◽

Miles Buchwaldt ◽

Erin Higgins ◽

...

Keyword(s):

De Novo ◽

Low Complexity ◽

Error Rates ◽

Brassica Nigra ◽

Genome Integrity ◽

Ancestral Genome ◽

Genomic Distance ◽

Long Read ◽

Genome Assemblies ◽

Technology Comparison

AbstractHigh-quality nanopore genome assemblies were generated for two Brassica nigra genotypes (Ni100 and CN115125); a member of the agronomically important Brassica species. The N50 contig length for the two assemblies were 17.1 Mb (58 contigs) and 0.29 Mb (963 contigs), respectively, reflecting recent improvements in the technology. Comparison with a de novo short read assembly for Ni100 corroborated genome integrity and quantified sequence related error rates (0.002%). The contiguity and coverage allowed unprecedented access to low complexity regions of the genome. Pericentromeric regions and coincidence of hypo-methylation enabled localization of active centromeres and identified a novel centromere-associated ALE class I element which appears to have proliferated through relatively recent nested transposition events (<1 million years ago). Computational abstraction was used to define a post-triplication Brassica specific ancestral genome and to calculate the extensive rearrangements that define the genomic distance separating B. nigra from its diploid relatives.

Download Full-text

Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies

10.1101/2020.03.15.992941 ◽

2020 ◽

Cited By ~ 15

Author(s):

Arang Rhie ◽

Brian P. Walenz ◽

Sergey Koren ◽

Adam M. Phillippy

Keyword(s):

De Novo ◽

High Accuracy ◽

Link Type ◽

Base Level ◽

Project Home Page ◽

Set Operations ◽

Assembly Evaluation ◽

Long Read ◽

Genome Assemblies ◽

Reference Genomes

AbstractRecent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, can be generated for evaluation. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.Availability of data and materialProject name: MerquryProject home page: https://github.com/marbl/merqury, https://github.com/marbl/merylArchived version: https://github.com/marbl/merqury/releases/tag/v1.0Operating system(s): Platform independentProgramming language: C++, Java, PerlOther requirements: gcc 4.8 or higher, java 1.6 or higherLicense: Public domain (see https://github.com/marbl/merqury/blob/master/README.license) Any restrictions to use by non-academics: No restrictions applied

Download Full-text

Origin and evolution of the Yangtze River reconstructed from the largest molecular phylogeny of Cyprinidae

10.21203/rs.3.rs-145035/v3 ◽

2021 ◽

Author(s):

Feng Chen ◽

Ge Xue ◽

Yeke Wang ◽

Hucai Zhang ◽

Peter D. Clift ◽

...

Keyword(s):

Molecular Phylogeny ◽

East Asia ◽

Time Resolution ◽

Yangtze River ◽

Evolutionary History ◽

East Asian ◽

The Yangtze River ◽

Origin And Evolution ◽

History Of ◽

Lacustrine Ecosystem

Abstract The Yangtze River is the longest river in Asia, but its evolutionary history has long been debated. So far no robust biological evidences can be found to crack this mystery. Here we reconstruct spatiotemporal and diversification dynamics of endemic East Asian cyprinids based on a largest molecular phylogeny of Cyprinidae, including 1420 species, and show that their ancestors laying adhesive eggs were distributed in southern East Asia before ~24 Ma, subsequently dispersed to the Yangtze River to spawn semi-buoyant eggs at ~19 Ma. This indicates that the Yangtze River diverted eastward around the Oligocene-Miocene boundary. Some of these cyprinids evolved again into fishes producing adhesive eggs at ~13 Ma, together with a peaked net diversification rate, indicating that the river formed a potamo-lacustrine ecosystem during the Mid-Miocene. Our reconstruction of the history of the Yangtze River has higher time resolution and much better continuity than those deriving from geological studies.

Download Full-text

Phylogeny of the Varidnaviria Morphogenesis Module: Congruence and Incongruence With the Tree of Life and Viral Taxonomy

Frontiers in Microbiology ◽

10.3389/fmicb.2021.704052 ◽

2021 ◽

Vol 12 ◽

Author(s):

Anthony C. Woo ◽

Morgan Gaia ◽

Julien Guglielmini ◽

Violette Da Cunha ◽

Patrick Forterre

Keyword(s):

Evolutionary History ◽

International Committee ◽

Tree Of Life ◽

Dna Viruses ◽

Origin And Evolution ◽

Double Stranded Dna ◽

Domains Of Life ◽

History Of ◽

Recent Classification ◽

Jelly Roll

Double-stranded DNA viruses of the realm Varidnaviria (formerly PRD1-adenovirus lineage) are characterized by homologous major capsid proteins (MCPs) containing one (kingdom: Helvetiavirae) or two β-barrel domains (kingdom: Bamfordvirae) known as the jelly roll folds. Most of them also share homologous packaging ATPases (pATPases). Remarkably, Varidnaviria infect hosts from the three domains of life, suggesting that these viruses could be very ancient and share a common ancestor. Here, we analyzed the evolutionary history of Varidnaviria based on single and concatenated phylogenies of their MCPs and pATPases. We excluded Adenoviridae from our analysis as their MCPs and pATPases are too divergent. Sphaerolipoviridae, the only family in the kingdom Helvetiavirae, exhibit a complex history: their MCPs are very divergent from those of other Varidnaviria, as expected, but their pATPases groups them with Bamfordvirae. In single and concatenated trees, Bamfordvirae infecting archaea were grouped with those infecting bacteria, in contradiction with the cellular tree of life, whereas those infecting eukaryotes were organized into three monophyletic groups: the Nucleocytoviricota phylum, formerly known as the Nucleo-Cytoplasmic Large DNA Viruses (NCLDVs), Lavidaviridae (virophages) and Polintoviruses. Although our analysis mostly supports the recent classification proposed by the International Committee on Taxonomy of Viruses (ICTV), it also raises questions, such as the validity of the Adenoviridae and Helvetiavirae ranking. Based on our phylogeny, we discuss current hypotheses on the origin and evolution of Varidnaviria and suggest new ones to reconcile the viral and cellular trees.

Download Full-text

Purge Haplotigs: Synteny Reduction for Third-gen Diploid Genome Assemblies

10.1101/286252 ◽

2018 ◽

Cited By ~ 7

Author(s):

Michael J Roach ◽

Simon Schmidt ◽

Anthony R Borneman

Keyword(s):

De Novo ◽

Haplotype Reconstruction ◽

Minimal Impact ◽

Variant Discovery ◽

Rapid Release ◽

Long Read ◽

Recent Developments ◽

Reference Quality ◽

Downstream Analysis ◽

Genome Assemblies

AbstractRecent developments in third-gen long read sequencing and diploid-aware assemblers have resulted in the rapid release of numerous reference-quality assemblies for diploid genomes. However, assembling highly heterozygous genomes is still facing a major problem where the two haplotypes for a region are highly polymorphic and the synteny is not recognised during assembly. This causes issues with downstream analysis, for example variant discovery using the haploid assembly, or haplotype reconstruction using the diploid assembly. A new pipeline—Purge Haplotigs—was developed specifically for third-gen assemblies to identify and reassign the duplicate contigs. The pipeline takes a draft haplotype-fused assembly or a diploid assembly, and read alignments to produce an improved assembly. The pipeline was tested on a simulated dataset and on four recent diploid (phased) de novo assemblies from third-generation long-read sequencing. All assemblies after processing with Purge Haplotigs were less duplicated with minimal impact on genome completeness. The software is available at https://bitbucket.org/mroachawri/purge_haplotigs under a permissive MIT licence.

Download Full-text