scholarly journals High-quality genome and methylomes illustrate features underlying evolutionary success of oaks

2021 ◽  
Author(s):  
Victoria L Sork ◽  
Shawn Cokus ◽  
Sorel T. Fitz-Gibbon ◽  
Alexey V. Zimin ◽  
Daniela Puiu ◽  
...  

The genus Quercus, which emerged ~55 million years ago during globally warm temperatures, diversified into ~450 species. We present a high-quality de novo genome assembly of a California endemic oak, Quercus lobata, revealing features consistent with oak evolutionary success. Effective population size remained large throughout history despite declining since the early Miocene. Analysis of 39,373 mapped protein-coding genes outlined copious duplications consistent with genetic and phenotypic diversity, both by retention of genes created during the ancient γ whole genome hexaploid duplication event and by tandem duplication within families, including the numerous resistance genes and also unexpected candidate genes for an incompatibility system involving multiple non-self-recognition genes. An additional surprising finding is that subcontext-specific patterns of DNA methylation associated with transposable elements reveal broadly-distributed heterochromatin in intergenic regions, similar to grasses (another highly successful taxon). Collectively, these features promote genetic and phenotypic variation that would facilitate adaptability to changing environments.

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Yun Li ◽  
Hairong Wei ◽  
Jun Yang ◽  
Kang Du ◽  
Jiang Li ◽  
...  

Abstract We report the acquisition of a high-quality haploid chromosome-scale genome assembly for the first time in a tree species, Eucommia ulmoides, which is known for its rubber biosynthesis and medicinal applications. The assembly was obtained by applying PacBio and Hi–C technologies to a haploid that we specifically generated. Compared to the initial genome release, this one has significantly improved assembly quality. The scaffold N50 (53.15 MB) increased 28-fold, and the repetitive sequence content (520 Mb) increased by 158.24 Mb, whereas the number of gaps decreased from 104,772 to 128. A total of 92.87% of the 26,001 predicted protein-coding genes identified with multiple strategies were anchored to the 17 chromosomes. A new whole-genome duplication event was superimposed on the earlier γ paleohexaploidization event, and the expansion of long terminal repeats contributed greatly to the evolution of the genome. The more primitive rubber biosynthesis of this species, as opposed to that in Hevea brasiliensis, relies on the methylerythritol-phosphate pathway rather than the mevalonate pathway to synthesize isoprenyl diphosphate, as the MEP pathway operates predominantly in trans-polyisoprene-containing leaves and central peels. Chlorogenic acid biosynthesis pathway enzymes were preferentially expressed in leaves rather than in bark. This assembly with higher sequence contiguity can foster not only studies on genome structure and evolution, gene mapping, epigenetic analysis and functional genomics but also efforts to improve E. ulmoides for industrial and medical uses through genetic engineering.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Rashmi Jain ◽  
Jerry Jenkins ◽  
Shengqiang Shu ◽  
Mawsheng Chern ◽  
Joel A. Martin ◽  
...  

Abstract Background The availability of thousands of complete rice genome sequences from diverse varieties and accessions has laid the foundation for in-depth exploration of the rice genome. One drawback to these collections is that most of these rice varieties have long life cycles, and/or low transformation efficiencies, which limits their usefulness as model organisms for functional genomics studies. In contrast, the rice variety Kitaake has a rapid life cycle (9 weeks seed to seed) and is easy to transform and propagate. For these reasons, Kitaake has emerged as a model for studies of diverse monocotyledonous species. Results Here, we report the de novo genome sequencing and analysis of Oryza sativa ssp. japonica variety KitaakeX, a Kitaake plant carrying the rice XA21 immune receptor. Our KitaakeX sequence assembly contains 377.6 Mb, consisting of 33 scaffolds (476 contigs) with a contig N50 of 1.4 Mb. Complementing the assembly are detailed gene annotations of 35,594 protein coding genes. We identified 331,335 genomic variations between KitaakeX and Nipponbare (ssp. japonica), and 2,785,991 variations between KitaakeX and Zhenshan97 (ssp. indica). We also compared Kitaake resequencing reads to the KitaakeX assembly and identified 219 small variations. The high-quality genome of the model rice plant KitaakeX will accelerate rice functional genomics. Conclusions The high quality, de novo assembly of the KitaakeX genome will serve as a useful reference genome for rice and will accelerate functional genomics studies of rice and other species.


GigaScience ◽  
2019 ◽  
Vol 8 (10) ◽  
Author(s):  
Sarah B Kingan ◽  
Julie Urban ◽  
Christine C Lambert ◽  
Primo Baybayan ◽  
Anna K Childers ◽  
...  

ABSTRACT Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.


2017 ◽  
Author(s):  
Matthieu Legendre ◽  
Elisabeth Fabre ◽  
Olivier Poirot ◽  
Sandra Jeudy ◽  
Audrey Lartigue ◽  
...  

AbstractWith DNA genomes up to 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infectingPandoravirusesremained the most spectacular viruses since their description in 2013. Our isolation of three new strains from distant locations and environments allowed us to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses, led to the discovery of many non-coding transcripts while significantly reducing the former set of predicted protein-coding genes. We found that the Pandoraviridae exhibit an open pan genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggests thatde novogene creation is a strong component in the evolution of the giant Pandoravirus genomes.


Author(s):  
Xinhai Ye ◽  
Yi Yang ◽  
Zhaoyang Tian ◽  
Le Xu ◽  
Kaili Yu ◽  
...  

AbstractSequencing and assembling a genome with a single individual have several advantages, such as lower heterozygosity and easier sample preparation. However, the amount of genomic DNA of some small sized organisms might not meet the standard DNA input requirement for current sequencing pipelines. Although few studies sequenced a single small insect with about 100 ng DNA as input, it may still be challenging for many small organisms to obtain such amount of DNA from a single individual. Here, we use 20 ng DNA as input, and present a high-quality genome assembly for a single haploid male parasitoid wasp (Habrobracon hebetor) using Nanopore and Illumina. Because of the low input DNA, a whole genome amplification (WGA) method is used before sequencing. The assembled genome size is 131.6 Mb with a contig N50 of 1.63 Mb. A total of 99% Benchmarking Universal Single-Copy Orthologs are detected, suggesting the high level of completeness of the genome assembly. Genome comparison between H. hebetor and its relative Bracon brevicornis shows a high-level genome synteny, indicating the genome of H. hebetor is highly accurate and contiguous. Our study provides an example for de novo assembling a genome from ultra-low input DNA, and will be used for sequencing projects of small sized species and rare samples, haploid genomics as well as population genetics of small sized species.


2019 ◽  
Author(s):  
Rashmi Jain ◽  
Jerry Jenkins ◽  
Shengqiang Shu ◽  
Mawsheng Chern ◽  
Joel A. Martin ◽  
...  

AbstractHere, we report the de novo genome sequencing and analysis of Oryza sativa ssp. japonica variety KitaakeX, a Kitaake plant carrying the rice XA21 immune receptor. Our KitaakeX sequence assembly contains 377.6 Mb, consisting of 33 scaffolds (476 contigs) with a contig N50 of 1.4 Mb. Complementing the assembly are detailed gene annotations of 35,594 protein coding genes. We identified 331,335 genomic variations between KitaakeX and Nipponbare (ssp. japonica), and 2,785,991 variations between KitaakeX and Zhenshan97 (ssp. indica). We also compared Kitaake resequencing reads to the KitaakeX assembly and identified 219 small variations. The high-quality genome of the model rice plant KitaakeX will accelerate rice functional genomics.


2018 ◽  
Author(s):  
Matthieu Legendre ◽  
Jean-Marie Alempic ◽  
Nadège Philippe ◽  
Audrey Lartigue ◽  
Sandra Jeudy ◽  
...  

AbstractWith genomes of up to 2.7 Mb propagated in µm-long oblong particles and initially predicted to encode more than 2000 proteins, members of the Pandoraviridae family display the most extreme features of the known viral world. The mere existence of such giant viruses raises fundamental questions about their origin and the processes governing their evolution. A previous analysis of six newly available isolates, independently confirmed by a study including 3 others, established that the Pandoraviridae pan-genome is open, meaning that each new strain exhibits protein-coding genes not previously identified in other family members. With an average increment of about 60 proteins, the gene repertoire shows no sign of reaching a limit and remains largely coding for proteins without recognizable homologs in other viruses or cells (ORFans). To explain these results, we proposed that most new protein-coding genes were created de novo, from pre-existing non-coding regions of the G+C rich pandoravirus genomes. The comparison of the gene content of a new isolate, P. celtis, closely related (96% identical genome) to the previously described P. quercus is now used to test this hypothesis by studying genomic changes in a microevolution range. Our results confirm that the differences between these two similar gene contents mostly consist of protein-coding genes without known homologs (ORFans), with statistical signatures close to that of intergenic regions. These newborn proteins are under slight negative selection, perhaps to maintain stable folds and prevent protein aggregation pending the eventual emergence of fitness-increasing functions. Our study also unraveled several insertion events mediated by a transposase of the hAT family, 3 copies of which are found in P. celtis and are presumably active. Members of the Pandoraviridae are presently the first viruses known to encode this type of transposase.


2018 ◽  
Author(s):  
Jolene T. Sutton ◽  
Martin Helmkampf ◽  
Cynthia C. Steiner ◽  
M. Renee Bellinger ◽  
Jonas Korlach ◽  
...  

AbstractGenome-level data can provide researchers with unprecedented precision to examine the causes and genetic consequences of population declines, and to apply these results to conservation management. Here we present a high-quality, long-read, de novo genome assembly for one of the world’s most endangered bird species, the Alala. As the only remaining native crow species in Hawaii, the Alala survived solely in a captive breeding program from 2002 until 2016, at which point a long-term reintroduction program was initiated. The high-quality genome assembly was generated to lay the foundation for both comparative genomics studies, and the development of population-level genomic tools that will aid conservation and recovery efforts. We illustrate how the quality of this assembly places it amongst the very best avian genomes assembled to date, comparable to intensively studied model systems. We describe the genome architecture in terms of repetitive elements and runs of homozygosity, and we show that compared with more outbred species, the Alala genome is substantially more homozygous. We also provide annotations for a subset of immunity genes that are likely to be important for conservation applications, and we discuss how this genome is currently being used as a roadmap for downstream conservation applications.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Nikolaos Vakirlis ◽  
Omer Acar ◽  
Brian Hsu ◽  
Nelson Castilho Coelho ◽  
S. Branden Van Oss ◽  
...  

AbstractRecent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection.


GigaScience ◽  
2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Yujing Suo ◽  
Peng Sun ◽  
Huihui Cheng ◽  
Weijuan Han ◽  
Songfeng Diao ◽  
...  

Abstract Background Diospyros oleifera Cheng, of the family Ebenaceae, is an economically important tree. Phylogenetic analyses indicate that D. oleifera is closely related to Diospyros kaki Thunb. and could be used as a model plant for studies of D. kaki. Therefore, development of genomic resources of D. oleifera will facilitate auxiliary assembly of the hexaploid persimmon genome and elucidate the molecular mechanisms of important traits. Findings The D. oleifera genome was assembled with 443.6 Gb of raw reads using the Pacific Bioscience Sequel and Illumina HiSeq X Ten platforms. The final draft genome was ∼812.3 Mb and had a high level of continuity with N50 of 3.36 Mb. Fifteen scaffolds corresponding to the 15 chromosomes were assembled to a final size of 721.5 Mb using 332 scaffolds, accounting for 88.81% of the genome. Repeat sequences accounted for 54.8% of the genome. By de novo sequencing and analysis of homology with other plant species, 30,530 protein-coding genes with an average transcript size of 7,105.40 bp were annotated; of these, 28,580 protein-coding genes (93.61%) had conserved functional motifs or terms. In addition, 171 candidate genes involved in tannin synthesis and deastringency in persimmon were identified; of these chalcone synthase (CHS) genes were expanded in the D. oleifera genome compared with Diospyros lotus, Camellia sinensis, and Vitis vinifera. Moreover, 186 positively selected genes were identified, including chalcone isomerase (CHI) gene, a key enzyme in the flavonoid-anthocyanin pathway. Phylogenetic tree analysis indicated that the split of D. oleifera and D. lotus likely occurred 9.0 million years ago. In addition to the ancient γ event, a second whole-genome duplication event occurred in D. oleifera and D. lotus. Conclusions We generated a high-quality chromosome-level draft genome for D. oleifera, which will facilitate assembly of the hexaploid persimmon genome and further studies of major economic traits in the genus Diospyros.


Sign in / Sign up

Export Citation Format

Share Document