scholarly journals AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication

2021 ◽  
Vol 119 (1) ◽  
pp. e2113075119
Author(s):  
Baoxing Song ◽  
Santiago Marco-Sola ◽  
Miquel Moreto ◽  
Lynn Johnson ◽  
Edward S. Buckler ◽  
...  

Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor–binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.

2021 ◽  
Author(s):  
Baoxing Song ◽  
Santiago Marco-Sola ◽  
Miquel Moreto ◽  
Lynn Johnson ◽  
Edward S. Buckler ◽  
...  

Millions of species are currently being sequenced and their genomes are being compared. Many of them have more complex genomes than model systems and raised novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication or polyploidy levels. Here we introduce AnchorWave, which performs whole-genome duplication informed collinear anchor identification between genomes and performs base-pair resolution global alignments for collinear blocks using the wavefront algorithm and a 2-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multi-kilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs between two maize lines. By contrast, other genome alignment tools showed almost zero power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome than the closest competitive approach, when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor binding sites (TFBSs) at a rate of 1.05-74.85 fold higher than other tools, while with significantly lower false positive alignments. AnchorWave shows obvious improvement when applied to genomes with dispersed repeats, active transposable elements, high sequence diversity and whole-genome duplication variation.


Genetics ◽  
2000 ◽  
Vol 156 (3) ◽  
pp. 1249-1257
Author(s):  
Ilya Ruvinsky ◽  
Lee M Silver ◽  
Jeremy J Gibson-Brown

Abstract The duplication of preexisting genes has played a major role in evolution. To understand the evolution of genetic complexity it is important to reconstruct the phylogenetic history of the genome. A widely held view suggests that the vertebrate genome evolved via two successive rounds of whole-genome duplication. To test this model we have isolated seven new T-box genes from the primitive chordate amphioxus. We find that each amphioxus gene generally corresponds to two or three vertebrate counterparts. A phylogenetic analysis of these genes supports the idea that a single whole-genome duplication took place early in vertebrate evolution, but cannot exclude the possibility that a second duplication later took place. The origin of additional paralogs evident in this and other gene families could be the result of subsequent, smaller-scale chromosomal duplications. Our findings highlight the importance of amphioxus as a key organism for understanding evolution of the vertebrate genome.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Gareth B. Gillard ◽  
Lars Grønvold ◽  
Line L. Røsæg ◽  
Matilde Mengkrog Holen ◽  
Øystein Monsen ◽  
...  

Abstract Background Whole genome duplication (WGD) events have played a major role in eukaryotic genome evolution, but the consequence of these extreme events in adaptive genome evolution is still not well understood. To address this knowledge gap, we used a comparative phylogenetic model and transcriptomic data from seven species to infer selection on gene expression in duplicated genes (ohnologs) following the salmonid WGD 80–100 million years ago. Results We find rare cases of tissue-specific expression evolution but pervasive expression evolution affecting many tissues, reflecting strong selection on maintenance of genome stability following genome doubling. Ohnolog expression levels have evolved mostly asymmetrically, by diverting one ohnolog copy down a path towards lower expression and possible pseudogenization. Loss of expression in one ohnolog is significantly associated with transposable element insertions in promoters and likely driven by selection on gene dosage including selection on stoichiometric balance. We also find symmetric expression shifts, and these are associated with genes under strong evolutionary constraints such as ribosome subunit genes. This possibly reflects selection operating to achieve a gene dose reduction while avoiding accumulation of “toxic mutations”. Mechanistically, ohnolog regulatory divergence is dictated by the number of bound transcription factors in promoters, with transposable elements being one likely source of novel binding sites driving tissue-specific gains in expression. Conclusions Our results imply pervasive adaptive expression evolution following WGD to overcome the immediate challenges posed by genome doubling and to exploit the long-term genetic opportunities for novel phenotype evolution.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Amit Rai ◽  
Hideki Hirakawa ◽  
Ryo Nakabayashi ◽  
Shinji Kikuchi ◽  
Koki Hayashi ◽  
...  

AbstractPlant genomes remain highly fragmented and are often characterized by hundreds to thousands of assembly gaps. Here, we report chromosome-level reference and phased genome assembly of Ophiorrhiza pumila, a camptothecin-producing medicinal plant, through an ordered multi-scaffolding and experimental validation approach. With 21 assembly gaps and a contig N50 of 18.49 Mb, Ophiorrhiza genome is one of the most complete plant genomes assembled to date. We also report 273 nitrogen-containing metabolites, including diverse monoterpene indole alkaloids (MIAs). A comparative genomics approach identifies strictosidine biogenesis as the origin of MIA evolution. The emergence of strictosidine biosynthesis-catalyzing enzymes precede downstream enzymes’ evolution post γ whole-genome triplication, which occurred approximately 110 Mya in O. pumila, and before the whole-genome duplication in Camptotheca acuminata identified here. Combining comparative genome analysis, multi-omics analysis, and metabolic gene-cluster analysis, we propose a working model for MIA evolution, and a pangenome for MIA biosynthesis, which will help in establishing a sustainable supply of camptothecin.


2018 ◽  
Vol 30 (11) ◽  
pp. 2741-2760 ◽  
Author(s):  
Zhicheng Zhang ◽  
Heleen Coenen ◽  
Philip Ruelens ◽  
Rashmi R. Hazarika ◽  
Tareq Al Hindi ◽  
...  

2019 ◽  
Vol 35 (1) ◽  
pp. 42-54 ◽  
Author(s):  
Ximena Escalera-Fanjul ◽  
Héctor Quezada ◽  
Lina Riego-Ruiz ◽  
Alicia González

Sign in / Sign up

Export Citation Format

Share Document