scholarly journals Assembly of the Boechera retrofracta Genome and Evolutionary Analysis of Apomixis-Associated Genes

Author(s):  
Sergei Kliver ◽  
Mike Rayko ◽  
Alexey Komissarov ◽  
Evgeny Bakin ◽  
Daria Zhernakova ◽  
...  

Closely related to the model plant Arabidopsis thaliana, the genus Boechera is known to contain both sexual and apomictic species or accessions. Boechera retrofracta is a diploid sexually reproducing species and is thought to be an ancestral parent species of the apomictic species Boechera divaricarpa. Here we report the de novo assembly of the B. retrofracta genome using short Illumina and Roche reads from 1 paired-end and 3 mate pair libraries. The distribution of 23-mers from the paired end library has indicated a low level of heterozygosity and the presence of detectable duplications and triplications. The genome size was estimated to be equal 227 Mb. N50 of the assembled scaffolds was 2.3 Mb. 27048 protein-coding genes were predicted using a hybrid approach that combines homology-based and de novo methods. Also repeats, tRNA and rRNA genes were annotated. Finally, genes of B. retrofracta and 6 other Brassicaceae species were used for phylogenetic tree reconstruction. Also, a detailed analysis of evolution of the APOLLO apomixis-associated locus was performed. An assembled genome of B. retrofracta will help in the challenging assembly of the highly heterozygous genomes of hybrid apomictic species such as B. divaricarpa.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8450 ◽  
Author(s):  
Sunan Huang ◽  
Xuejun Ge ◽  
Asunción Cano ◽  
Betty Gaby Millán Salazar ◽  
Yunfei Deng

The genus Dicliptera (Justicieae, Acanthaceae) consists of approximately 150 species distributed throughout the tropical and subtropical regions of the world. Newly obtained chloroplast genomes (cp genomes) are reported for five species of Dilciptera (D. acuminata, D. peruviana, D. montana, D. ruiziana and D. mucronata) in this study. These cp genomes have circular structures of 150,689–150,811 bp and exhibit quadripartite organizations made up of a large single copy region (LSC, 82,796–82,919 bp), a small single copy region (SSC, 17,084–17,092 bp), and a pair of inverted repeat regions (IRs, 25,401–25,408 bp). Guanine-Cytosine (GC) content makes up 37.9%–38.0% of the total content. The complete cp genomes contain 114 unique genes, including 80 protein-coding genes, 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes. Comparative analyses of nucleotide variability (Pi) reveal the five most variable regions (trnY-GUA-trnE-UUC, trnG-GCC, psbZ-trnG-GCC, petN-psbM, and rps4-trnL-UUA), which may be used as molecular markers in future taxonomic identification and phylogenetic analyses of Dicliptera. A total of 55-58 simple sequence repeats (SSRs) and 229 long repeats were identified in the cp genomes of the five Dicliptera species. Phylogenetic analysis identified a close relationship between D. ruiziana and D. montana, followed by D. acuminata, D. peruviana, and D. mucronata. Evolutionary analysis of orthologous protein-coding genes within the family Acanthaceae revealed only one gene, ycf15, to be under positive selection, which may contribute to future studies of its adaptive evolution. The completed genomes are useful for future research on species identification, phylogenetic relationships, and the adaptive evolution of the Dicliptera species.


2020 ◽  
Vol 12 (3) ◽  
pp. 185-202
Author(s):  
Xia Han ◽  
Jindan Guo ◽  
Erli Pang ◽  
Hongtao Song ◽  
Kui Lin

Abstract How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.


2019 ◽  
Author(s):  
Thomas Hackl ◽  
Roman Martin ◽  
Karina Barenhoff ◽  
Sarah Duponchel ◽  
Dominik Heider ◽  
...  

AbstractThe heterotrophic stramenopile Cafeteria roenbergensis is a globally distributed marine bacterivorous protist. This unicellular flagellate is host to the giant DNA virus CroV and the virophage mavirus. We sequenced the genomes of four cultured C. roenbergensis strains and generated 23.53 Gb of Illumina MiSeq data (99-282 × coverage per strain) and 5.09 Gb of PacBio RSII data (13-54 × coverage). Using the Canu assembler and customized curation procedures, we obtained high-quality draft genome assemblies with a total length of 34-36 Mbp per strain and contig N50 lengths of 148 kbp to 464 kbp. The C. roenbergensis genome has a GC content of ~70%, a repeat content of ~28%, and is predicted to contain approximately 7857-8483 protein-coding genes based on a combination of de novo, homology-based and transcriptome-supported annotation. These first high-quality genome assemblies of a Bicosoecid fill an important gap in sequenced Stramenopile representatives and enable a more detailed evolutionary analysis of heterotrophic protists.


2019 ◽  
Vol 8 (32) ◽  
Author(s):  
Nicolas E. Gaultier ◽  
Ana Carolina M. Junqueira ◽  
Akira Uchida ◽  
Rikky W. Purbojati ◽  
James N. I. Houghton ◽  
...  

Nissabacter sp. strain SGAir0207 was isolated from a tropical air sample collected in Singapore. Its genome was assembled using a hybrid approach with long and short reads, resulting in one chromosome of 3.9 Mb and 7 plasmids. The complete genome consists of 4,403 protein-coding, 84 tRNA, and 22 rRNA genes.


2021 ◽  
Vol 11 ◽  
Author(s):  
Kaihui Zhao ◽  
Lianqiang Li ◽  
Hong Quan ◽  
Junbo Yang ◽  
Zhirong Zhang ◽  
...  

Zanthoxylum L. is an economic crop with a long history of cultivation and domestication and has important economic, ecological, and medicinal value. To solve the classification problems caused by the similar morphological characteristics of Zanthoxylum and establish a credible phylogenetic relationship, we sequenced and annotated six Zanthoxylum chloroplast (cp) genomes (Z. piasezkii, Z. armatum, Z. motuoense, Z. oxyphyllum, Z. multijugum, and Z. calcicola) and combined them with previously published genomes for the Zanthoxylum species. We used bioinformatics methods to analyze the genomic characteristics, contraction, and expansion of inverted repeat (IR) regions; differences in simple sequence repeats (SSRs) and long repeat sequences; species pairwise Ka/Ks ratios; divergence hotspots; and phylogenetic relationships of the 14 Zanthoxylum species. The results revealed that cp genomes of Zanthoxylum range in size from 158,071 to 158,963 bp and contain 87 protein-coding, 37 tRNA, and 8 rRNA genes. Seven mutational hotspots were identified as candidate DNA barcode sequences to distinguish Zanthoxylum species. The phylogenetic analysis strongly supported the genus Fagara as a subgenus of Zanthoxylum and proposed the possibility of a new subgenus in Zanthoxylum. The availability of these genomes will provide valuable information for identifying species, molecular breeding, and evolutionary analysis of Zanthoxylum.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yuliya A. Putintseva ◽  
Eugeniya I. Bondar ◽  
Evgeniy P. Simonov ◽  
Vadim V. Sharov ◽  
Natalya V. Oreshkova ◽  
...  

Abstract Background Plant mitochondrial genomes (mitogenomes) can be structurally complex while their size can vary from ~ 222 Kbp in Brassica napus to 11.3 Mbp in Silene conica. To date, in comparison with the number of plant species, only a few plant mitogenomes have been sequenced and released, particularly for conifers (the Pinaceae family). Conifers cover an ancient group of land plants that includes about 600 species, and which are of great ecological and economical value. Among them, Siberian larch (Larix sibirica Ledeb.) represents one of the keystone species in Siberian boreal forests. Yet, despite its importance for evolutionary and population studies, the mitogenome of Siberian larch has not yet been assembled and studied. Results Two sources of DNA sequences were used to search for mitochondrial DNA (mtDNA) sequences: mtDNA enriched samples and nucleotide reads generated in the de novo whole genome sequencing project, respectively. The assembly of the Siberian larch mitogenome contained nine contigs, with the shortest and the largest contigs being 24,767 bp and 4,008,762 bp, respectively. The total size of the genome was estimated at 11.7 Mbp. In total, 40 protein-coding, 34 tRNA, and 3 rRNA genes and numerous repetitive elements (REs) were annotated in this mitogenome. In total, 864 C-to-U RNA editing sites were found for 38 out of 40 protein-coding genes. The immense size of this genome, currently the largest reported, can be partly explained by variable numbers of mobile genetic elements, and introns, but unlikely by plasmid-related sequences. We found few plasmid-like insertions representing only 0.11% of the entire Siberian larch mitogenome. Conclusions Our study showed that the size of the Siberian larch mitogenome is much larger than in other so far studied Gymnosperms, and in the same range as for the annual flowering plant Silene conica (11.3 Mbp). Similar to other species, the Siberian larch mitogenome contains relatively few genes, and despite its huge size, the repeated and low complexity regions cover only 14.46% of the mitogenome sequence.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Indrani Sarkar ◽  
Prateek Dey ◽  
Sanjeev Kumar Sharma ◽  
Swapna Devi Ray ◽  
Venkata Hanumat Sastry Kochiganti ◽  
...  

Abstract Mitochondrial genome provides useful information about species concerning its evolution and phylogenetics. We have taken the advantage of high throughput next-generation sequencing technique to sequence the complete mitogenome of Yellow-billed babbler (Turdoides affinis), a species endemic to Peninsular India and Sri Lanka. Both, reference-based and de-novo assemblies of mitogenome were performed and observed that de-novo assembled mitogenome was most appropriate. The complete mitogenome of yellow-billed babbler (assembled de-novo) was 17,672 bp in length with 53.2% AT composition. Thirteen protein-coding genes along with two rRNAs and 22 tRNAs were detected. The arrangement pattern of these genes was found conserved among Leiothrichidae family mitogenomes. Duplicated control regions were found in the newly sequenced mitogenome. Downstream bioinformatics analysis revealed the effect of translational efficiency and purifying selection pressure over thirteen protein-coding genes in yellow-billed babbler mitogenome. Ka/Ks analysis indicated the highest synonymous substitution rate in the nad6 gene. Evolutionary analysis revealed the conserved nature of all the protein-coding genes across Leiothrichidae family mitogenomes. Our limited phylogeny results placed T. affinis in a separate group, a sister group of Garrulax. Overall, our results provide a useful information for future studies on the evolutionary and adaptive mechanisms of birds belong to the Leiothrichidae family.


ZooKeys ◽  
2020 ◽  
Vol 995 ◽  
pp. 67-80
Author(s):  
Guolei Sun ◽  
Chao Zhao ◽  
Tian Xia ◽  
Qinguo Wei ◽  
Xiufeng Yang ◽  
...  

Mitochondrial DNA is a useful molecular marker for phylogenetic and evolutionary analysis. In the current study, we determined the complete mitochondrial genome of Eophona personata, the Japanese Grosbeak, and the phylogenetic relationships of E. personata and 16 other species of the family Fringillidae based on the sequences of 12 mitochondrial protein-coding genes. The mitochondrial genome of E. personata consists of 16,771 base pairs, and contains 13 protein-coding genes, 22 transfer RNA (tRNA) genes, 2 ribosomal RNA (rRNA) genes, and one control region. Analysis of the base composition revealed an A+T bias, a positive AT skew and a negative GC skew. The mitochondrial gene order and arrangement in E. personata was similar to the typical avian mitochondrial gene arrangement. Phylogenetic analysis of 17 species of Fringillidae, based on Bayesian inference and Maximum Likelihood (ML) estimation, showed that the genera Coccothraustes and Hesperiphona are closely related to the genus Eophona, and further showed a sister-group relationship of E. personata and E. migratoria.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Robert M. Nowak ◽  
Jan P. Jastrzębski ◽  
Wiktor Kuśmirek ◽  
Rusłan Sałamatin ◽  
Małgorzata Rydzanicz ◽  
...  

AbstractDespite the use of Hymenolepis diminuta as a model organism in experimental parasitology, a full genome description has not yet been published. Here we present a hybrid de novo genome assembly based on complementary sequencing technologies and methods. The combination of Illumina paired-end, Illumina mate-pair and Oxford Nanopore Technology reads greatly improved the assembly of the H. diminuta genome. Our results indicate that the hybrid sequencing approach is the method of choice for obtaining high-quality data. The final genome assembly is 177 Mbp with contig N50 size of 75 kbp and a scaffold N50 size of 2.3 Mbp. We obtained one of the most complete cestode genome assemblies and annotated 15,169 potential protein-coding genes. The obtained data may help explain cestode gene function and better clarify the evolution of its gene families, and thus the adaptive features evolved during millennia of co-evolution with their hosts.


2019 ◽  
Author(s):  
Fen Zhang ◽  
Wei Li ◽  
Cheng-wen Gao ◽  
Li-zhi Gao

ABSTRACTTea is the most popular non-alcoholic caffeine-containing and the oldest beverage in the world. Despite its enormous industrial, cultural and medicinal values, the chloroplast (cp) and mitochondrial (mt) genomes are not available for Camellia sinensis var. assamica. In this study, we de novo assembled the cp genome sequence of C. sinensis var. assamica into a circular contig of 157,100 bp in length with an overall GC content of 37.29%, comprising a large single-copy region (LSC, 86,649 bp) and a small single-copy region (SSC, 18,285 bp) separated by a pair of inverted repeats (IRs, 26,083 bp). We annotated a total of 141 cp genes, of which 87 are protein-coding genes, 46 are tRNA genes, and eight are rRNA genes. We also de novo assembled the mt genome of C. sinensis var. assamica into two complete circular scaffolds (702,253 bp and 178,082 bp) with overall GC contents of 45.63% and 45.81%, respectively. We annotated a total of 71 mt genes, including 44 protein-coding genes, 24 tRNAs, and 3 rRNAs. Comparative analysis suggests repeat-rich nature of the mt genome compared to the cp genome, for example, with the characterization of 37,878 bp and 149 bp of long repeat sequences and 665 and 214 SSRs, respectively. We also detected 478 RNA-editing sites in 42 protein-coding mt genes, which are ∼4.4-fold more than 54 RNA-editing sites detected in 21 protein-coding cp genes. The high-quality cp and mt genomes of C. sinensis var. assamica presented in this study will become an invaluable resource for a range of genetic, functional, evolutionary and comparative genomic studies in tea tree and other Camellia species of the Theaceae family.


Sign in / Sign up

Export Citation Format

Share Document