scholarly journals Comparison of different annotation tools for characterization of the complete chloroplast genome of Corylus avellana cv Tombul

BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Kadriye Kahraman ◽  
Stuart James Lucas

Abstract Background Several bioinformatics tools have been designed for assembly and annotation of chloroplast (cp) genomes, making it difficult to decide which is most useful and applicable to a specific case. The increasing number of plant genomes provide an opportunity to accurately obtain cp genomes from whole genome shotgun (WGS) sequences. Due to the limited genetic information available for European hazelnut (Corylus avellana L.) and as part of a genome sequencing project, we analyzed the complete chloroplast genome of the cultivar ‘Tombul’ with multiple annotation tools. Results Three different annotation strategies were tested, and the complete cp genome of C. avellana cv Tombul was constructed, which was 161,667 bp in length, and had a typical quadripartite structure. A large single copy (LSC) region of 90,198 bp and a small single copy (SSC) region of 18,733 bp were separated by a pair of inverted repeat (IR) regions of 26,368 bp. In total, 125 predicted functional genes were annotated, including 76 protein-coding, 25 tRNA, and 4 rRNA unique genes. Comparative genomics indicated that the cp genome sequences were relatively highly conserved in species belonging to the same order. However, there were still some variations, especially in intergenic regions, that could be used as molecular markers for analyses of phylogeny and plant identification. Simple sequence repeat (SSR) analysis showed that there were 83 SSRs in the cp genome of cv Tombul. Phylogenetic analysis suggested that C. avellana cv Tombul had a close affinity to the sister group of C. fargesii and C. chinensis, and then a closer evolutionary relationship with Betulaceae family than other species of Fagales. Conclusion In this study, the complete cp genome of Corylus avellana cv Tombul, the most widely cultivated variety in Turkey, was obtained and annotated, and additionally phylogenetic relationships were predicted among Fagales species. Our results suggest a very accurate assembly of chloroplast genome from next generation whole genome shotgun (WGS) sequences. Enhancement of taxon sampling in Corylus species provide genomic insights into phylogenetic analyses. The nucleotide sequences of cv Tombul cp genomes can provide comprehensive genetic insight into the evolution of genus Corylus.

Plants ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1354
Author(s):  
Slimane Khayi ◽  
Fatima Gaboun ◽  
Stacy Pirro ◽  
Tatiana Tatusova ◽  
Abdelhamid El Mousadik ◽  
...  

Argania spinosa (Sapotaceae), an important endemic Moroccan oil tree, is a primary source of argan oil, which has numerous dietary and medicinal proprieties. The plant species occupies the mid-western part of Morocco and provides great environmental and socioeconomic benefits. The complete chloroplast (cp) genome of A. spinosa was sequenced, assembled, and analyzed in comparison with those of two Sapotaceae members. The A. spinosa cp genome is 158,848 bp long, with an average GC content of 36.8%. The cp genome exhibits a typical quadripartite and circular structure consisting of a pair of inverted regions (IR) of 25,945 bp in length separating small single-copy (SSC) and large single-copy (LSC) regions of 18,591 and 88,367 bp, respectively. The annotation of A. spinosa cp genome predicted 130 genes, including 85 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, and 37 transfer RNA (tRNA) genes. A total of 44 long repeats and 88 simple sequence repeats (SSR) divided into mononucleotides (76), dinucleotides (7), trinucleotides (3), tetranucleotides (1), and hexanucleotides (1) were identified in the A. spinosa cp genome. Phylogenetic analyses using the maximum likelihood (ML) method were performed based on 69 protein-coding genes from 11 species of Ericales. The results confirmed the close position of A. spinosa to the Sideroxylon genus, supporting the revisiting of its taxonomic status. The complete chloroplast genome sequence will be valuable for further studies on the conservation and breeding of this medicinally and culinary important species and also contribute to clarifying the phylogenetic position of the species within Sapotaceae.


2021 ◽  
Vol 51 (3) ◽  
pp. 337-344
Author(s):  
Yongsung KIM ◽  
Hong XI ◽  
Jongsun PARK

The chloroplast genome of Limonium tetragonum (Thunb.) Bullock, a halophytic species, was sequenced to understand genetic differences based on its geographical distribution. The cp genome of L. tetragonum was 154,689 bp long (GC ratio is 37.0%) and has four subregions: 84,572 bp of large single-copy (35.3%) and 12,813 bp of small singlecopy (31.5%) regions were separated by 28,562 bp of inverted repeat (40.9%) regions. It contained 128 genes (83 proteincoding genes, eight rRNAs, and 37 tRNAs). Thirty-five single-nucleotide polymorphisms and 33 INDEL regions (88 bp in length) were identified. Maximum-likelihood and Bayesian inference phylogenetic trees showed that L. tetragonum formed a sister group with L. aureum, which is incongruent with certain previous studies, including a phylogenetic analysis.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e6032 ◽  
Author(s):  
Zhenyu Zhao ◽  
Xin Wang ◽  
Yi Yu ◽  
Subo Yuan ◽  
Dan Jiang ◽  
...  

Dioscorea L., the largest genus of the family Dioscoreaceae with over 600 species, is not only an important food but also a medicinal plant. The identification and classification of Dioscorea L. is a rather difficult task. In this study, we sequenced five Dioscorea chloroplast genomes, and analyzed with four other chloroplast genomes of Dioscorea species from GenBank. The Dioscorea chloroplast genomes displayed the typical quadripartite structure of angiosperms, which consisted of a pair of inverted repeats separated by a large single-copy region, and a small single-copy region. The location and distribution of repeat sequences and microsatellites were determined, and the rapidly evolving chloroplast genome regions (trnK-trnQ, trnS-trnG, trnC-petN, trnE-trnT, petG-trnW-trnP, ndhF, trnL-rpl32, and ycf1) were detected. Phylogenetic relationships of Dioscorea inferred from chloroplast genomes obtained high support even in shortest internodes. Thus, chloroplast genome sequences provide potential molecular markers and genomic resources for phylogeny and species identification.


2019 ◽  
Vol 2019 ◽  
pp. 1-17 ◽  
Author(s):  
Samaila S. Yaradua ◽  
Dhafer A. Alzahrani ◽  
Enas J. Albokhary ◽  
Abidina Abba ◽  
Abubakar Bello

The complete chloroplast genome of J. flava, an endangered medicinal plant in Saudi Arabia, was sequenced and compared with cp genome of three Acanthaceae species to characterize the cp genome, identify SSRs, and also detect variation among the cp genomes of the sampled Acanthaceae. NOVOPlasty was used to assemble the complete chloroplast genome from the whole genome data. The cp genome of J. flava was 150, 888bp in length with GC content of 38.2%, and has a quadripartite structure; the genome harbors one pair of inverted repeat (IRa and IRb 25, 500bp each) separated by large single copy (LSC, 82, 995 bp) and small single copy (SSC, 16, 893 bp). There are 132 genes in the genome, which includes 80 protein coding genes, 30 tRNA, and 4 rRNA; 113 are unique while the remaining 19 are duplicated in IR regions. The repeat analysis indicates that the genome contained all types of repeats with palindromic occurring more frequently; the analysis also identified total number of 98 simple sequence repeats (SSR) of which majority are mononucleotides A/T and are found in the intergenic spacer. The comparative analysis with other cp genomes sampled indicated that the inverted repeat regions are conserved than the single copy regions and the noncoding regions show high rate of variation than the coding region. All the genomes have ndhF and ycf1 genes in the border junction of IRb and SSC. Sequence divergence analysis of the protein coding genes showed that seven genes (petB, atpF, psaI, rpl32, rpl16, ycf1, and clpP) are under positive selection. The phylogenetic analysis revealed that Justiceae is sister to Ruellieae. This study reported the first cp genome of the largest genus in Acanthaceae and provided resources for studying genetic diversity of J. flava as well as resolving phylogenetic relationships within the core Acanthaceae.


2020 ◽  
Vol 18 (1) ◽  
pp. 87-102
Author(s):  
Nguyen Thanh Diem ◽  
Le Thi Ly ◽  
Nguyen Huu Thuan Anh ◽  
Nguyen Thanh Cong ◽  
Vu Thi Huyen Trang

Chloroplasts and mitochondria are organelles that have their own genome in a cell. The chloroplast genome provides information on the evolutionary relationship and species identification, valuable markers for transgenic plants, and cloning plants, etc. The application of Next Generation Sequencing has improved the chloroplast genome sequencing. However, the assembly process of chloroplast genome is quite complicated due to the need of different complex bioinformatics tools, high configuration computer and laborous. Here we configured the process of assembling the chloroplast genome of Paphiopedilum delenatii. The assembled chloroplast genome was 160,955 bp in length, including a large and a small single copy region (LSC, SSC) separated by a pair of inverted repeats (IR). Total genes were 130 genes, GC content is 35.6%. Genome data was mapped and registered in GenBank under accession number MK463585. The optimal parameters for genome assembling were recommended. This study not only provided information for conservation of the Vietnam endemic Paphiopedilum delenatii species but also supported the genome assemble researches which could be applied on other subjects.


2020 ◽  
Vol 2020 ◽  
pp. 1-13 ◽  
Author(s):  
Lu Wang ◽  
Na He ◽  
Yao Li ◽  
Yanming Fang ◽  
Feilong Zhang

Chinese lacquer tree (Toxicodendron vernicifluum) is an important commercial arbor species widely cultivated in East Asia for producing highly durable lacquer. Here, we sequenced and analyzed the complete chloroplast (cp) genome of T. vernicifluum and reconstructed the phylogeny of Sapindales based on 52 cp genomes of six families. The plastome of T. vernicifluum is 159,571 bp in length, including a pair of inverted repeats (IRs) of 26,511 bp, separated by a large single-copy (LSC) region of 87,475 bp and a small single-copy (SSC) region of 19,074 bp. A total of 126 genes were identified, of which 81 are protein-coding genes, 37 are transfer RNA genes, and eight are ribosomal RNA genes. Forty-nine mononucleotide microsatellites, one dinucleotide microsatellite, two complex microsatellites, and 49 long repeats were determined. Structural differences such as inversion variation in LSC and gene loss in IR were detected across cp genomes of the six genera in Anacardiaceae. Phylogenetic analyses revealed that the genus Toxicodendron is closely related to Pistacia and Rhus. The phylogenetic relationships of the six families in Sapindales were well resolved. Overall, this study providing complete cp genome resources will be beneficial for determining potential molecular markers and evolutionary patterns of T. vernicifluum and its closely related species.


2021 ◽  
Author(s):  
Jianjian Li ◽  
Junqin Zong ◽  
Haoran Wang ◽  
Jingjing Wang ◽  
Hailin Guo ◽  
...  

Abstract Background: Chloroplast (cp) genome sequence data could provide valuable information for molecular taxonomy and phylogenetic reconstruction among plant species and individuals. However, as one of the most important warm-season turfgrasses widely used in USA and China, cp genome characteristics and phylogenetic position of centipedegrass (Eremochloa ophiuroides) were poorly understood.Results: In this study, we determined the complete chloroplast genome sequences of E. ophiuroides using high-throughput Illumina sequencing technology. The circle pseudomolecule for E. ophiuroides cp genome is 139,107 bp in length, and has a typical quadripartite structure consisting of a pair of inverted repeat (IR) regions of 22,230 bp each separated by a large single copy (LSC) region of 82,081 bp and a small single copy (SSC) region of 12,566 bp. The nucleotide composition of E. ophiuroides cp genome is asymmetric with an overall A + T content of 61.60%. It encodes a total of 131 gene species, composed of 20 duplicated genes within the IR regions and 111 unique genes including 77 protein-coding genes (PCGs), 30 transfer RNA (tRNA) genes and four ribosome RNA (rRNA) genes. Analysis of the repetitive sequences revealed that E. ophiuroides cp genome contains 51 tandem repeats including 29 forward, 20 palindromic and 2 reverse repeats, and 197 simple sequence repeats (SSRs) which were mainly composed of adenine (A) and thymine (T) bases. Comparison of the E. ophiuroides complete cp genome with the genomes of other seven Gramineae species showed a high degree of collinearity among Gramineae plants. Phylogenetic analysis showed that E. ophiuroides was closely related to E. ciliaris and E. eriopoda, and was placed in a clade with the two Eremochloa species and Mnesithea helferi within the subtribe Rottboelliinae, which clarified evolutionary status of E. ophiuroides in tribe Andropogoneae and also authenticated the current taxonomy of the tribe Andropogoneae.Conclusions: The present study provides the complete structure of the E. ophiuroides cp genome, and preliminarily ascertains the phylogenetic position of E. ophiuroides in tribe Andropogonodae. This will be of value to grass taxa identification, phylogenetic resolution, population structure and biodiversity, novel gene discovery and functional genomic studies for the genus Eremochloa.


2021 ◽  
Author(s):  
Junqin Zong ◽  
Haoran Wang ◽  
Jingjing Wang ◽  
Hailin Guo ◽  
Jingbo Chen ◽  
...  

Abstract Background: Chloroplast (cp) genome sequence data could provide valuable information for molecular taxonomy and phylogenetic reconstruction among plant species and individuals. However, as one of the most important warm-season turfgrasses widely used in USA and China, cp genome characteristics and phylogenetic position of centipedegrass (Eremochloa ophiuroides) were poorly understood.Results: In this study, we determined the complete chloroplast genome sequences of E. ophiuroides using high-throughput Illumina sequencing technology. The circle pseudomolecule for E. ophiuroides cp genome is 139,107 bp in length, and has a typical quadripartite structure consisting of a pair of inverted repeat (IR) regions of 22,230 bp each separated by a large single copy (LSC) region of 82,081 bp and a small single copy (SSC) region of 12,566 bp. The nucleotide composition of E. ophiuroides cp genome is asymmetric with an overall A + T content of 61.60%. It encodes a total of 131 gene species, composed of 20 duplicated genes within the IR regions and 111 unique genes including 77 protein-coding genes (PCGs), 30 transfer RNA (tRNA) genes and four ribosome RNA (rRNA) genes. Analysis of the repetitive sequences revealed that E. ophiuroides cp genome contains 51 tandem repeats including 29 forward, 20 palindromic and 2 reverse repeats, and 197 simple sequence repeats (SSRs) which were mainly composed of adenine (A) and thymine (T) bases. Comparison of the E. ophiuroides complete cp genome with the genomes of other seven Gramineae species showed a high degree of collinearity among Gramineae plants. Phylogenetic analysis showed that E. ophiuroides was closely related to E. ciliaris and E. eriopoda, and was placed in a clade with the two Eremochloa species and Mnesithea helferi within the subtribe Rottboelliinae, which clarified evolutionary status of E. ophiuroides in tribe Andropogoneae and also authenticated the current taxonomy of the tribe Andropogoneae.Conclusions: The present study provides the complete structure of the E. ophiuroides cp genome, and preliminarily ascertains the phylogenetic position of E. ophiuroides in tribe Andropogonodae. This will be of value to grass taxa identification, phylogenetic resolution, population structure and biodiversity, novel gene discovery and functional genomic studies for the genus Eremochloa.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Abdul Latif Khan ◽  
Sajjad Asaf ◽  
Lubna ◽  
Ahmed Al-Rawahi ◽  
Ahmed Al-Harrasi

Abstract Background Salvadora persica L. (Toothbrush tree – Miswak; family-Salvadoraceae) grows in the arid-land ecosystem and possesses economic and medicinal importance. The species, genus and the family have no genomic datasets available specifically on chloroplast (cp) genomics and taxonomic evolution. Herein, we have sequenced the complete chloroplast genome of S. persica for the first time and compared it with 11 related specie’s cp genomes from the order Brassicales. Results The S. persica cp genome was 153,379 bp in length containing a sizeable single-copy region (LSC) of 83,818 bp which separated from the small single-copy region (SSC) of 17,683 bp by two inverted repeats (IRs) each 25,939 bp. Among these genomes, the largest cp genome size (160,600 bp) was found in M. oleifera, while in S. persica it was the smallest (153,379 bp). The cp genome of S. persica encoded 131 genes, including 37 tRNA genes, eight rRNA genes and 86 protein-coding genes. Besides, S. persica contains 27 forward, 36 tandem and 19 palindromic repeats. The S. persica cp genome had 154 SSRs with the highest number in the LSC region. Complete cp genome comparisons showed an overall high degree of sequence resemblance between S. persica and related cp genomes. Some divergence was observed in the intergenic spaces of other species. Phylogenomic analyses of 60 shared genes indicated that S. persica formed a single clade with A. tetracantha with high bootstrap values. The family Salvadoraceae is closely related to Capparaceae and Petadiplandraceae rather than to Bataceae and Koberliniacaea. Conclusion The current genomic datasets provide pivotal genetic resources to determine the phylogenetic relationships, genome evolution and future genetic diversity-related studies of S. persica in complex angiosperm families.


Molecules ◽  
2018 ◽  
Vol 23 (11) ◽  
pp. 2917 ◽  
Author(s):  
Xin Zhang ◽  
Chunxiao Rong ◽  
Ling Qin ◽  
Chuanyuan Mo ◽  
Lu Fan ◽  
...  

Malus hupehensis belongs to the Malus genus (Rosaceae) and is an indigenous wild crabapple of China. This species has received more and more attention, due to its important medicinal, and excellent ornamental and economical, values. In this study, the whole chloroplast (cp) genome of Malus hupehensis, using a Hiseq X Ten sequencing platform, is reported. The M. hupehensis cp genome is 160,065 bp in size, containing a large single copy region (LSC) of 88,166 bp and a small single copy region (SSC) of 19,193 bp, separated by a pair of inverted repeats (IRs) of 26,353 bp. It contains 112 genes, including 78 protein-coding genes (PCGs), 30 transfer RNA genes (tRNAs), and four ribosomal RNA genes (rRNAs). The overall nucleotide composition is 36.6% CG. A total of 96 simple sequence repeats (SSRs) were identified, most of them were found to be mononucleotide repeats composed of A/T. In addition, a total of 49 long repeats were identified, including 24 forward repeats, 21 palindromic repeats, and four reverse repeats. Comparisons of the IR boundaries of nine Malus complete chloroplast genomes presented slight variations at IR/SC boundaries regions. A phylogenetic analysis, based on 26 chloroplast genomes using the maximum likelihood (ML) method, indicates that M. hupehensis clustered closer ties with M. baccata, M. micromalus, and M. prunifolia than with M. tschonoskii. The availability of the complete chloroplast genome using genomics methods is reported here and provides reliable genetic information for future exploration on the taxonomy and phylogenetic evolution of the Malus and related species.


Sign in / Sign up

Export Citation Format

Share Document