Rapid Genome Evolution and Adaptation of Thlaspi arvense Mediated by Recurrent RNA-Based and Tandem Gene Duplications

Retrotransposons are the most abundant group of transposable elements (TEs) in plants, providing an extraordinarily versatile source of genetic variation. Thlaspi arvense, a close relative of the model plant Arabidopsis thaliana with worldwide distribution, thrives from sea level to above 4,000 m elevation in the Qinghai-Tibet Plateau (QTP), China. Its strong adaptability renders it an ideal model system for studying plant adaptation in extreme environments. However, how the retrotransposons affect the T. arvense genome evolution and adaptation is largely unknown. We report a high-quality chromosome-scale genome assembly of T. arvense with a scaffold N50 of 59.10 Mb. Long terminal repeat retrotransposons (LTR-RTs) account for 56.94% of the genome assembly, and the Gypsy superfamily is the most abundant TEs. The amplification of LTR-RTs in the last six million years primarily contributed to the genome size expansion in T. arvense. We identified 351 retrogenes and 303 genes flanked by LTRs, respectively. A comparative analysis showed that orthogroups containing those retrogenes and genes flanked by LTRs have a higher percentage of significantly expanded orthogroups (SEOs), and these SEOs possess more recent tandem duplicated genes. All present results indicate that RNA-based gene duplication (retroduplication) accelerated the subsequent tandem duplication of homologous genes resulting in family expansions, and these expanded gene families were implicated in plant growth, development, and stress responses, which were one of the pivotal factors for T. arvense’s adaptation to the harsh environment in the QTP regions. In conclusion, the high-quality assembly of the T. arvense genome provides insights into the retroduplication mediated mechanism of plant adaptation to extreme environments.

Download Full-text

Genome ofCrucihimalaya himalaica, a close relative ofArabidopsis, shows ecological adaptation to high altitude

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1817580116 ◽

2019 ◽

Vol 116 (14) ◽

pp. 7137-7146 ◽

Cited By ~ 20

Author(s):

Ticao Zhang ◽

Qin Qiao ◽

Polina Yu. Novikova ◽

Qia Wang ◽

Jipei Yue ◽

...

Keyword(s):

Dna Repair ◽

Positive Selection ◽

Genome Sequence ◽

Draft Genome ◽

Extreme Environments ◽

Gene Families ◽

Ecological Adaptation ◽

Tibet Plateau ◽

Close Relative ◽

Arabidopsis Lyrata

Crucihimalaya himalaica, a close relative ofArabidopsisandCapsella, grows on the Qinghai–Tibet Plateau (QTP) about 4,000 m above sea level and represents an attractive model system for studying speciation and ecological adaptation in extreme environments. We assembled a draft genome sequence of 234.72 Mb encoding 27,019 genes and investigated its origin and adaptive evolutionary mechanisms. Phylogenomic analyses based on 4,586 single-copy genes revealed thatC. himalaicais most closely related toCapsella(estimated divergence 8.8 to 12.2 Mya), whereas both species form a sister clade toArabidopsis thalianaandArabidopsis lyrata, from which they diverged between 12.7 and 17.2 Mya. LTR retrotransposons inC. himalaicaproliferated shortly after the dramatic uplift and climatic change of the Himalayas from the Late Pliocene to Pleistocene. Compared with closely related species,C. himalaicashowed significant contraction and pseudogenization in gene families associated with disease resistance and also significant expansion in gene families associated with ubiquitin-mediated proteolysis and DNA repair. We identified hundreds of genes involved in DNA repair, ubiquitin-mediated proteolysis, and reproductive processes with signs of positive selection. Gene families showing dramatic changes in size and genes showing signs of positive selection are likely candidates forC. himalaica’s adaptation to intense radiation, low temperature, and pathogen-depauperate environments in the QTP. Loss of function at the S-locus, the reason for the transition to self-fertilization ofC. himalaica, might have enabled its QTP occupation. Overall, the genome sequence ofC. himalaicaprovides insights into the mechanisms of plant adaptation to extreme environments.

Download Full-text

High-quality genome assembly, annotation and evolutionary analysis of the mungbean (Vigna radiata) genome

10.22541/au.160587196.63922177/v1 ◽

2020 ◽

Author(s):

Qiang Yan ◽

Qiong Wang ◽

Cheng Xuzhen ◽

Lixia Wang ◽

Prakit Somta ◽

...

Keyword(s):

Genome Assembly ◽

Vigna Radiata ◽

Crop Improvement ◽

Repetitive Sequences ◽

Gene Families ◽

Close Relative ◽

Specific Gene ◽

Evolutionary Analysis ◽

High Quality ◽

High Quality Genome

Mungbean (Vigna radiata [L.]) is an important economic crop grown in South, and East Asia. The low contiguity of the current assembly of V. radiata genome has limited its application. Here, we report a high-quality chromosome-scale assembled genome of V. radiata to facilitate the investigation of its genome characteristics and evolution. By combination of Nanopore long reads, Illumina short reads and Hi-C data, we generated a high-quality genome assembly of V. radiata, with 473.67 megabases assembled into 11 chromosomes with contig N50 and scaffold N50 of 11.3 and 42.4 megabases, respectively. A total of 52.8% of the genome was annotated as repetitive sequences, among which LTRs (long terminal repeats) were predominant (33.9%). The genome of V. radiata was predicted to contain 33,924 genes, 32,470 (95.7%) of which could be functionally annotated. Evolutionary analysis revealed an estimated divergence time of V. radiata from its close relative V. angularis of ~11.66 million years ago. In addition, 277 V. radiata specific gene families, 18 positively selected genes were detected and functionally annotated. This high-quality mungbean genome will provide valuable resources for further genetic analysis and crop improvement of mungbean and other legume species.

Download Full-text

A high-quality genome assembly of Morinda officinalis, a famous native southern herb in the Lingnan region of southern China

Horticulture Research ◽

10.1038/s41438-021-00551-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Jihua Wang ◽

Shiqiang Xu ◽

Yu Mei ◽

Shike Cai ◽

Yan Gu ◽

...

Keyword(s):

Genome Evolution ◽

Genome Assembly ◽

Large Scale ◽

Active Component ◽

Southern China ◽

Repetitive Sequences ◽

Gene Families ◽

Comparative Genomic ◽

Edible Plant ◽

High Quality

AbstractMorinda officinalis is a well-known medicinal and edible plant that is widely cultivated in the Lingnan region of southern China. Its dried roots (called bajitian in traditional Chinese medicine) are broadly used to treat various diseases, such as impotence and rheumatism. Here, we report a high-quality chromosome-scale genome assembly of M. officinalis using Nanopore single-molecule sequencing and Hi-C technology. The assembled genome size was 484.85 Mb with a scaffold N50 of 40.97 Mb, and 90.77% of the assembled sequences were anchored on eleven pseudochromosomes. The genome includes 27,698 protein-coding genes, and most of the assemblies are repetitive sequences. Genome evolution analysis revealed that M. officinalis underwent core eudicot γ genome triplication events but no recent whole-genome duplication (WGD). Likewise, comparative genomic analysis showed no large-scale structural variation after species divergence between M. officinalis and Coffea canephora. Moreover, gene family analysis indicated that gene families associated with plant–pathogen interactions and sugar metabolism were significantly expanded in M. officinalis. Furthermore, we identified many candidate genes involved in the biosynthesis of major active components such as anthraquinones, iridoids and polysaccharides. In addition, we also found that the DHQS, GGPPS, TPS-Clin, TPS04, sacA, and UGDH gene families—which include the critical genes for active component biosynthesis—were expanded in M. officinalis. This study provides a valuable resource for understanding M. officinalis genome evolution and active component biosynthesis. This work will facilitate genetic improvement and molecular breeding of this commercially important plant.

Download Full-text

A high‐quality carabid genome assembly provides insights into beetle genome evolution and cold adaptation

Molecular Ecology Resources ◽

10.1111/1755-0998.13409 ◽

2021 ◽

Author(s):

Yi‐Ming Weng ◽

Charlotte B. Francoeur ◽

Cameron R. Currie ◽

David H. Kavanaugh ◽

Sean D. Schoville

Keyword(s):

Genome Evolution ◽

Genome Assembly ◽

Cold Adaptation ◽

High Quality

Download Full-text

A Chromosome-Scale Genome Assembly Resource for Myriosclerotinia sulcatula Infecting Sedge Grass (Carex sp.)

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-03-20-0060-a ◽

2020 ◽

Vol 33 (7) ◽

pp. 880-883

Author(s):

Stefan Kusch ◽

Heba M. M. Ibrahim ◽

Catherine Zanchetta ◽

Celine Lopez-Roques ◽

Cecile Donnadieu ◽

...

Keyword(s):

Host Range ◽

Sclerotinia Sclerotiorum ◽

Genome Assembly ◽

Plant Pathogens ◽

Reference Genome ◽

Close Relative ◽

High Quality ◽

Protein Coding ◽

Protein Coding Genes ◽

Reference Genome Assembly

The fungus Myriosclerotinia sulcatula is a close relative of the notorious polyphagous plant pathogens Botrytis cinerea and Sclerotinia sclerotiorum but exhibits a host range restricted to plants from the Carex genus (Cyperaceae family). To date, there are no genomic resources available for fungi in the Myriosclerotinia genus. Here, we present a chromosome-scale reference genome assembly for M. sulcatula. The assembly contains 24 contigs with a total length of 43.53 Mbp, with scaffold N50 of 2,649.7 kbp and N90 of 1,133.1 kbp. BRAKER-predicted gene models were manually curated using WebApollo, resulting in 11,275 protein-coding genes that we functionally annotated. We provide a high-quality reference genome assembly and annotation for M. sulcatula as a resource for studying evolution and pathogenicity in fungi from the Sclerotiniaceae family.

Download Full-text

A high-quality genome assembly and annotation of the gray mangrove, Avicennia marina

10.1101/2020.05.30.124800 ◽

2020 ◽

Author(s):

Guillermo Friis ◽

Joel Vizueta ◽

David R. Nelson ◽

Basel Khraiwesh ◽

Enas Qudeimat ◽

...

Keyword(s):

Genome Assembly ◽

Extreme Environments ◽

Avicennia Marina ◽

Phenotypic Traits ◽

High Quality ◽

Mangrove Species ◽

Protein Coding ◽

Proximity Ligation ◽

Adaptive Processes ◽

West Indian Ocean

AbstractThe gray mangrove [Avicennia marina (Forsk.) Vierh.] is the most widely distributed mangrove species, ranging throughout the Indo-West Pacific. It presents remarkable levels of geographic variation both in phenotypic traits and habitat, often occupying extreme environments at the edges of its distribution. However, subspecific evolutionary relationships and adaptive mechanisms remain understudied, especially across populations of the West Indian Ocean. High-quality genomic resources accounting for such variability are also sparse. Here we report the first chromosome-level assembly of the genome of A. marina. We used a previously release draft assembly and proximity ligation libraries Chicago and Dovetail HiC for scaffolding, producing a 456,526,188 bp long genome. The largest 32 scaffolds (22.4 Mb to 10.5 Mb) accounted for 98 % of the genome assembly, with the remaining 2% distributed among much shorter 3,759 scaffolds (62.4 Kb to 1 Kb). We annotated 23,331 protein-coding genes using tissue-specific RNA-seq data, from which 13,312 were associated to GO terms. Genome assembly and annotated set of genes yield a 96.7% and 92.3% completeness score, respectively, when compared with the eudicots BUSCO dataset. Furthermore, an FST survey based on resequencing data successfully identified a set of candidate genes potentially involved in local adaptation, and revealed patterns of adaptive variability correlating with a temperature gradient in Arabian mangrove populations. Our A. marina genomic assembly provides a highly valuable resource for genome evolution analysis, as well as for identifying functional genes involved in adaptive processes and speciation.

Download Full-text

Chromosome-Level Genome Assembly Reveals Significant Gene Expansion in the Toll and IMD Signaling Pathways of Dendrolimus kikuchii

Frontiers in Genetics ◽

10.3389/fgene.2021.728418 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jielong Zhou ◽

Peifu Wu ◽

Zhongping Xiong ◽

Naiyong Liu ◽

Ning Zhao ◽

...

Keyword(s):

Genome Assembly ◽

Phylogenetic Analyses ◽

Repetitive Sequences ◽

Gene Families ◽

Thaumetopoea Pityocampa ◽

High Quality ◽

Protein Coding ◽

Peptidoglycan Recognition Protein ◽

Recognition Protein ◽

Chromosome Level

A high-quality genome is of significant value when seeking to control forest pests such as Dendrolimus kikuchii, a destructive member of the order Lepidoptera that is widespread in China. Herein, a high quality, chromosome-level reference genome for D. kikuchii based on Nanopore, Pacbio HiFi sequencing and the Hi-C capture system is presented. Overall, a final genome assembly of 705.51 Mb with contig and scaffold N50 values of 20.89 and 24.73 Mb, respectively, was obtained. Of these contigs, 95.89% had unique locations on 29 chromosomes. In silico analysis revealed that the genome contained 15,323 protein-coding genes and 63.44% repetitive sequences. Phylogenetic analyses indicated that D. kikuchii may diverged from the common ancestor of Thaumetopoea. Pityocampa, Thaumetopoea ni, Heliothis virescens, Hyphantria armigera, Spodoptera frugiperda, and Spodoptera litura approximately 122.05 million years ago. Many gene families were expanded in the D. kikuchii genome, particularly those of the Toll and IMD signaling pathway, which included 10 genes in peptidoglycan recognition protein, 19 genes in MODSP, and 11 genes in Toll. The findings from this study will help to elucidate the mechanisms involved in protection of D. kikuchii against foreign substances and pathogens, and may highlight a potential channel to control this pest.

Download Full-text

A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant

Horticulture Research ◽

10.1038/s41438-020-00391-0 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Qingzhen Wei ◽

Jinglei Wang ◽

Wuhong Wang ◽

Tianhua Hu ◽

Haijiao Hu ◽

...

Keyword(s):

Genome Assembly ◽

Reference Genome ◽

Repetitive Sequences ◽

Gene Families ◽

Specific Gene ◽

High Quality ◽

Total Size ◽

Protein Coding ◽

Fruit Length ◽

Protein Coding Genes

Abstract Eggplant (Solanum melongena L.) is an economically important vegetable crop in the Solanaceae family, with extensive diversity among landraces and close relatives. Here, we report a high-quality reference genome for the eggplant inbred line HQ-1315 (S. melongena-HQ) using a combination of Illumina, Nanopore and 10X genomics sequencing technologies and Hi-C technology for genome assembly. The assembled genome has a total size of ~1.17 Gb and 12 chromosomes, with a contig N50 of 5.26 Mb, consisting of 36,582 protein-coding genes. Repetitive sequences comprise 70.09% (811.14 Mb) of the eggplant genome, most of which are long terminal repeat (LTR) retrotransposons (65.80%), followed by long interspersed nuclear elements (LINEs, 1.54%) and DNA transposons (0.85%). The S. melongena-HQ eggplant genome carries a total of 563 accession-specific gene families containing 1009 genes. In total, 73 expanded gene families (892 genes) and 34 contraction gene families (114 genes) were functionally annotated. Comparative analysis of different eggplant genomes identified three types of variations, including single-nucleotide polymorphisms (SNPs), insertions/deletions (indels) and structural variants (SVs). Asymmetric SV accumulation was found in potential regulatory regions of protein-coding genes among the different eggplant genomes. Furthermore, we performed QTL-seq for eggplant fruit length using the S. melongena-HQ reference genome and detected a QTL interval of 71.29–78.26 Mb on chromosome E03. The gene Smechr0301963, which belongs to the SUN gene family, is predicted to be a key candidate gene for eggplant fruit length regulation. Moreover, we anchored a total of 210 linkage markers associated with 71 traits to the eggplant chromosomes and finally obtained 26 QTL hotspots. The eggplant HQ-1315 genome assembly can be accessed at http://eggplant-hq.cn. In conclusion, the eggplant genome presented herein provides a global view of genomic divergence at the whole-genome level and powerful tools for the identification of candidate genes for important traits in eggplant.

Download Full-text

A high-quality genome assembly for the endangered golden snub-nosed monkey (Rhinopithecus roxellana)

GigaScience ◽

10.1093/gigascience/giz098 ◽

2019 ◽

Vol 8 (8) ◽

Cited By ~ 5

Author(s):

Lu Wang ◽

Jinwei Wu ◽

Xiaomei Liu ◽

Dandan Di ◽

Yuhong Liang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

Gene Families ◽

Rhinopithecus Roxellana ◽

High Quality ◽

Chromosome Conformation ◽

Protein Coding ◽

A Genome ◽

Close Relationship ◽

High Quality Genome

Abstract Background The golden snub-nosed monkey (Rhinopithecus roxellana) is an endangered colobine species endemic to China, which has several distinct traits including a unique social structure. Although a genome assembly for R. roxellana is available, it is incomplete and fragmented because it was constructed using short-read sequencing technology. Thus, important information such as genome structural variation and repeat sequences may be absent. Findings To obtain a high-quality chromosomal assembly for R. roxellana qinlingensis, we used 5 methods: Pacific Bioscience single-molecule real-time sequencing, Illumina paired-end sequencing, BioNano optical maps, 10X Genomics link-reads, and high-throughput chromosome conformation capture. The assembled genome was ∼3.04 Gb, with a contig N50 of 5.72 Mb and a scaffold N50 of 144.56 Mb. This represented a 100-fold improvement over the previously published genome. In the new genome, 22,497 protein-coding genes were predicted, of which 22,053 were functionally annotated. Gene family analysis showed that 993 and 2,745 gene families were expanded and contracted, respectively. The reconstructed phylogeny recovered a close relationship between R. rollexana and Macaca mulatta, and these 2 species diverged ∼13.4 million years ago. Conclusion We constructed a high-quality genome assembly of the Qinling golden snub-nosed monkey; it had superior continuity and accuracy, which might be useful for future genetic studies in this species and as a new standard reference genome for colobine primates. In addition, the updated genome assembly might improve our understanding of this species and could assist conservation efforts.

Download Full-text

High-quality genome assembly-based and functional analyses reveal the pathogenesis mechanisms and evolutionary landscape of wheat sharp eyespot Rhizoctonia cerealis

10.21203/rs.3.rs-152320/v1 ◽

2021 ◽

Author(s):

Lin Lu ◽

Feilong Guo ◽

Zhichao Zhang ◽

Lijun Pan ◽

Yu Hao ◽

...

Keyword(s):

Genome Assembly ◽

Control Strategies ◽

Gene Families ◽

Secretory Proteins ◽

High Quality ◽

Sharp Eyespot ◽

Functional Analyses ◽

Genome Scale ◽

High Quality Genome ◽

Rhizoctonia Cerealis

Abstract Wheat (Triticum aestivum) is one of the most important staple crops. The necrotrophic binucleate fungus Rhizoctonia cerealis is the causal agent for the devastating disease wheat sharp eyespot and additional diseases of other agricultural crops and bioenergy plants. In this study, we present the first high-quality genome assembly of R. cerealis Rc207, a highly aggressive strain isolated from wheat. The genome encodes expand and diverse sets of virulence-related proteins, especially secreted effectors, carbohydrate-active enzymes (CAZymes), metalloproteases, Cytochrome P450 (CYP450), and secondary metabolite-associated enzymes. Many of these genes, in particular those encoding secretory proteins and CYP450, showed markedly up-regulation during infection in wheat. Of 831 candidate secretory effectors, ten up-regulated secretory proteins, such as CAZymes, metalloproteases and antigens, were functionally validated as virulence factors required for the fungal infection in wheat. Further intra-species and inter-species comparative genomics analyses showed that repeat sequences, accounting for 17.87% of the genome, are the major driving force for the genome evolution, and frequently intraspecific gene duplication contributes to expansion of pathogenicity-related gene families. This is the first genome-scale investigation elucidating the pathogenesis mechanisms and evolutionary landscape of R. cerealis. Our results provide essential tools for further development of effective disease control strategies.

Download Full-text