Amplicon sequencing of single-copy protein-coding genes reveals accurate diversity for sequence-discrete microbiome populations

An in-depth understanding of microbial function and the division of ecological niches requires accurate delineation and identification of microbes at a fine taxonomic resolution. Microbial phylotypes are typically defined using a 97% small subunit (16S) rRNA threshold. However, increasing evidence has demonstrated the ubiquitous presence of taxonomic units of distinct functions within phylotypes. These so-called sequence-discrete populations (SDPs) have used to be mainly delineated by disjunct sequence similarity at the whole-genome level. However, gene markers that could accurately identify and quantify SDPs are lacking in microbial community studies. Here we developed a pipeline to screen single-copy protein-coding genes that could accurately characterize SDP diversity via amplicon sequencing of microbial communities. Fifteen candidate marker genes were evaluated using three criteria (extent of sequence divergence, phylogenetic accuracy, and conservation of primer regions) and the selected genes were subject to test the efficiency in differentiating SDPs within Gilliamella, a core honeybee gut microbial phylotype, as a proof-of-concept. The results showed that the 16S V4 region failed to report accurate SDP diversities due to low taxonomic resolution and changing copy numbers. In contrast, the single-copy genes recommended by our pipeline were able to successfully quantify Gilliamella SDPs for both mock samples and honeybee guts, with results highly consistent with those of metagenomics. The pipeline developed in this study is expected to identify single-copy protein coding genes capable of accurately quantifying diverse bacterial communities at the SDP level.

Download Full-text

Draft Genome Sequence of Bacillus sp. Strain IGA-FME-2, Isolated from the Bulk Soil of Soybean (Glycine max L.) in Northeast China

Microbiology Resource Announcements ◽

10.1128/mra.00004-21 ◽

2021 ◽

Vol 10 (16) ◽

Author(s):

Zhenhua Yu ◽

Sergio de los Santos-Villalobos ◽

Yansheng Li ◽

Jian Jin ◽

Fannie Isela Parra Cota ◽

...

Keyword(s):

Glycine Max ◽

Draft Genome ◽

Gc Content ◽

Single Copy ◽

Bulk Soil ◽

23S Rrna ◽

Protein Coding ◽

Content Type ◽

Protein Coding Genes ◽

Glycine Max L

ABSTRACT Here, we present the draft genome of Bacillus sp. strain IGA-FME-2. This strain was isolated from the bulk soil of soybean (Glycine max L.). Its genome consists of 3,810 protein-coding genes, 44 tRNAs, two 16S rRNAs, and a single copy of 23S rRNA, with a GC content of 46.4%.

Download Full-text

Comparative analysis of chloroplast genomes for five Dicliptera species (Acanthaceae): molecular structure, phylogenetic relationships, and adaptive evolution

PeerJ ◽

10.7717/peerj.8450 ◽

2020 ◽

Vol 8 ◽

pp. e8450 ◽

Cited By ~ 2

Author(s):

Sunan Huang ◽

Xuejun Ge ◽

Asunción Cano ◽

Betty Gaby Millán Salazar ◽

Yunfei Deng

Keyword(s):

Adaptive Evolution ◽

Phylogenetic Relationships ◽

Single Copy ◽

Rrna Genes ◽

Trna Genes ◽

Evolutionary Analysis ◽

Protein Coding ◽

Variable Regions ◽

Protein Coding Genes ◽

Chloroplast Genomes

The genus Dicliptera (Justicieae, Acanthaceae) consists of approximately 150 species distributed throughout the tropical and subtropical regions of the world. Newly obtained chloroplast genomes (cp genomes) are reported for five species of Dilciptera (D. acuminata, D. peruviana, D. montana, D. ruiziana and D. mucronata) in this study. These cp genomes have circular structures of 150,689–150,811 bp and exhibit quadripartite organizations made up of a large single copy region (LSC, 82,796–82,919 bp), a small single copy region (SSC, 17,084–17,092 bp), and a pair of inverted repeat regions (IRs, 25,401–25,408 bp). Guanine-Cytosine (GC) content makes up 37.9%–38.0% of the total content. The complete cp genomes contain 114 unique genes, including 80 protein-coding genes, 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes. Comparative analyses of nucleotide variability (Pi) reveal the five most variable regions (trnY-GUA-trnE-UUC, trnG-GCC, psbZ-trnG-GCC, petN-psbM, and rps4-trnL-UUA), which may be used as molecular markers in future taxonomic identification and phylogenetic analyses of Dicliptera. A total of 55-58 simple sequence repeats (SSRs) and 229 long repeats were identified in the cp genomes of the five Dicliptera species. Phylogenetic analysis identified a close relationship between D. ruiziana and D. montana, followed by D. acuminata, D. peruviana, and D. mucronata. Evolutionary analysis of orthologous protein-coding genes within the family Acanthaceae revealed only one gene, ycf15, to be under positive selection, which may contribute to future studies of its adaptive evolution. The completed genomes are useful for future research on species identification, phylogenetic relationships, and the adaptive evolution of the Dicliptera species.

Download Full-text

Phylogenomic analysis of 2556 single-copy protein-coding genes resolves most evolutionary relationships for the major clades in the most diverse group of lichen-forming fungi

Fungal Diversity ◽

10.1007/s13225-018-0407-7 ◽

2018 ◽

Vol 92 (1) ◽

pp. 31-41 ◽

Cited By ~ 8

Author(s):

David Pizarro ◽

Pradeep K. Divakar ◽

Felix Grewe ◽

Steven D. Leavitt ◽

Jen-Pan Huang ◽

...

Keyword(s):

Single Copy ◽

Evolutionary Relationships ◽

Phylogenomic Analysis ◽

Diverse Group ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Phylogeny of the Styracaceae Revisited Based on Whole Plastome Sequences, Including Novel Plastome Data from Parastyrax

Systematic Botany ◽

10.1600/036364421x16128061189576 ◽

2021 ◽

Vol 46 (1) ◽

pp. 162-174

Author(s):

Ming-Hui Yan ◽

Chun-Yang Li ◽

Peter W. Fritsch ◽

Jie Cai ◽

Heng-Chang Wang

Keyword(s):

Phylogenetic Relationships ◽

Phylogenetic Signal ◽

Strong Support ◽

Single Copy ◽

Rrna Genes ◽

Trna Genes ◽

Protein Coding ◽

Protein Coding Genes ◽

The Family ◽

Small Single Copy

Abstract—The phylogenetic relationships among 11 out of the 12 genera of the angiosperm family Styracaceae have been largely resolved with DNA sequence data based on all protein-coding genes of the plastome. The only genus that has not been phylogenomically investigated in the family with molecular data is the monotypic genus Parastyrax, which is extremely rare in the wild and difficult to collect. To complete the sampling of the genera comprising the Styracaceae, examine the plastome composition of Parastyrax, and further explore the phylogenetic relationships of the entire family, we sequenced the whole plastome of P. lacei and incorporated it into the Styracaceae dataset for phylogenetic analysis. Similar to most others in the family, the plastome is 158189 bp in length and contains a large single-copy region of 88085 bp and a small single-copy region of 18540 bp separated by two inverted-repeat regions of 25781 bp each. A total of 113 genes was predicted, including 79 protein-coding genes, 30 tRNA genes, and four rRNA genes. Phylogenetic relationships among all 12 genera of the family were constructed with 79 protein-coding genes. Consistent with a previous study, Styrax, Huodendron, and a clade of Alniphyllum + Bruinsmia were successively sister to the remainder of the family. Parastyrax was strongly supported as sister to an internal clade comprising seven other genera of the family, whereas Halesia and Pterostyrax were both recovered as polyphyletic, as in prior studies. However, when we employed either the whole plastome or the large- or small-single copy regions as datasets, Pterostyrax was resolved as monophyletic with 100% support, consistent with expectations based on morphology and indicating that non-coding regions of the Styracaceae plastome contain informative phylogenetic signal. Conversely Halesia was still resolved as polyphyletic but with novel strong support.

Download Full-text

Draft Genome Sequence of Bacillus sp. Strain SPB7, Isolated from the Marine Sponge Spongia officinalis

Microbiology Resource Announcements ◽

10.1128/mra.00358-20 ◽

2020 ◽

Vol 9 (30) ◽

Author(s):

Dhruba Bhattacharya ◽

Sergio de los Santos Villalobos ◽

Valeria Valenzuela Ruiz ◽

Joseph Selvin ◽

Joydeep Mukherjee

Keyword(s):

Genome Sequence ◽

Marine Sponge ◽

Draft Genome ◽

Single Copy ◽

5S Rrna ◽

Bacillus Sp ◽

Protein Coding ◽

Content Type ◽

Protein Coding Genes ◽

Spongia Officinalis

ABSTRACT The draft genome of Bacillus sp. SPB7, which was isolated from the marine sponge Spongia officinalis, is presented. This bacterium is a producer of an antimicrobial cyclic diketopiperazine, (3S,6S)-3,6-diisobutylpiperazine-2,5-dione. The genome consists of 4,511 protein-coding genes, 63 tRNAs, 2 16S rRNAs, 3 23S rRNAs, and a single copy of 5S rRNA.

Download Full-text

Complete Chloroplast Genome of Argania spinosa: Structural Organization and Phylogenetic Relationships in Sapotaceae

Plants ◽

10.3390/plants9101354 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1354

Author(s):

Slimane Khayi ◽

Fatima Gaboun ◽

Stacy Pirro ◽

Tatiana Tatusova ◽

Abdelhamid El Mousadik ◽

...

Keyword(s):

Chloroplast Genome ◽

Single Copy ◽

Rrna Genes ◽

Trna Genes ◽

Protein Coding ◽

Important Species ◽

Complete Chloroplast Genome ◽

Argania Spinosa ◽

Protein Coding Genes ◽

Cp Genome

Argania spinosa (Sapotaceae), an important endemic Moroccan oil tree, is a primary source of argan oil, which has numerous dietary and medicinal proprieties. The plant species occupies the mid-western part of Morocco and provides great environmental and socioeconomic benefits. The complete chloroplast (cp) genome of A. spinosa was sequenced, assembled, and analyzed in comparison with those of two Sapotaceae members. The A. spinosa cp genome is 158,848 bp long, with an average GC content of 36.8%. The cp genome exhibits a typical quadripartite and circular structure consisting of a pair of inverted regions (IR) of 25,945 bp in length separating small single-copy (SSC) and large single-copy (LSC) regions of 18,591 and 88,367 bp, respectively. The annotation of A. spinosa cp genome predicted 130 genes, including 85 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, and 37 transfer RNA (tRNA) genes. A total of 44 long repeats and 88 simple sequence repeats (SSR) divided into mononucleotides (76), dinucleotides (7), trinucleotides (3), tetranucleotides (1), and hexanucleotides (1) were identified in the A. spinosa cp genome. Phylogenetic analyses using the maximum likelihood (ML) method were performed based on 69 protein-coding genes from 11 species of Ericales. The results confirmed the close position of A. spinosa to the Sideroxylon genus, supporting the revisiting of its taxonomic status. The complete chloroplast genome sequence will be valuable for further studies on the conservation and breeding of this medicinally and culinary important species and also contribute to clarifying the phylogenetic position of the species within Sapotaceae.

Download Full-text

The complete chloroplast genome of Saxifraga sinomontana (Saxifragaceae) and comparative analysis with other Saxifragaceae species

Revista Brasileira de Botânica ◽

10.1007/s40415-019-00561-y ◽

2019 ◽

Vol 42 (4) ◽

pp. 601-611 ◽

Cited By ~ 1

Author(s):

Yan Li ◽

Liukun Jia ◽

Zhihua Wang ◽

Rui Xing ◽

Xiaofeng Chi ◽

...

Keyword(s):

Comparative Analysis ◽

Chloroplast Genome ◽

Phylogenetic Relationships ◽

De Novo ◽

Single Copy ◽

Bootstrap Support ◽

Protein Coding ◽

Complete Chloroplast Genome ◽

Protein Coding Genes ◽

Chloroplast Genomes

Abstract Saxifraga sinomontana J.-T. Pan & Gornall belongs to Saxifraga sect. Ciliatae subsect. Hirculoideae, a lineage containing ca. 110 species whose phylogenetic relationships are largely unresolved due to recent rapid radiations. Analyses of complete chloroplast genomes have the potential to significantly improve the resolution of phylogenetic relationships in this young plant lineage. The complete chloroplast genome of S. sinomontana was de novo sequenced, assembled and then compared with that of other six Saxifragaceae species. The S. sinomontana chloroplast genome is 147,240 bp in length with a typical quadripartite structure, including a large single-copy region of 79,310 bp and a small single-copy region of 16,874 bp separated by a pair of inverted repeats (IRs) of 25,528 bp each. The chloroplast genome contains 113 unique genes, including 79 protein-coding genes, four rRNAs and 30 tRNAs, with 18 duplicates in the IRs. The gene content and organization are similar to other Saxifragaceae chloroplast genomes. Sixty-one simple sequence repeats were identified in the S. sinomontana chloroplast genome, mostly represented by mononucleotide repeats of polyadenine or polythymine. Comparative analysis revealed 12 highly divergent regions in the intergenic spacers, as well as coding genes of matK, ndhK, accD, cemA, rpoA, rps19, ndhF, ccsA, ndhD and ycf1. Phylogenetic reconstruction of seven Saxifragaceae species based on 66 protein-coding genes received high bootstrap support values for nearly all identified nodes, suggesting a promising opportunity to resolve infrasectional relationships of the most species-rich section Ciliatae of Saxifraga.

Download Full-text

Identification of a novel transcription unit in the human insulin-like growth factor-II gene

Biochemical Journal ◽

10.1042/bj2800439 ◽

1991 ◽

Vol 280 (2) ◽

pp. 439-444 ◽

Cited By ~ 14

Author(s):

K Ikejiri ◽

T Wasada ◽

K Haruki ◽

N Hizuka ◽

Y Hirata ◽

...

Keyword(s):

Growth Factor ◽

Human Insulin ◽

Sequence Similarity ◽

Single Copy ◽

Transcription Unit ◽

Insulin Like Growth Factor ◽

Protein Coding ◽

Consensus Sequences ◽

Genomic Location ◽

Factor Ii

The human insulin-like growth factor-II (hIGF-II) gene has until now been thought to be composed of eight exons, including three independent leader exons. In the present study two additional exons, one leader exon and one alternatively used ordinate exon, have been newly identified. They were abundantly expressed in human histiocytoma tissue, generating mRNA species of about 5.0 kb in length. The new leader exon shows significant sequence similarity with the rE1 exon, previously reported to be transcribed only in the rat, and is mapped at nearly the same genomic location as in the rat. On the other hand, sequence similarity with another exon in the corresponding region of the rat genome was also found. It was, however, obvious that the rat sequence would not work as an active exon, since both splice acceptor and donor sites were deviated considerably from the consensus sequences. It has thus become apparent that the complex transcription unit of a single-copy hIGF-II gene comprises at least 10 exons, including four leader exons, one alternative exon and three common protein-coding exons.

Download Full-text

Complete Chloroplast Genome Sequence of Erigeron breviscapus and Characterization of Chloroplast Regulatory Elements

Frontiers in Plant Science ◽

10.3389/fpls.2021.758290 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yifan Yu ◽

Zhen Ouyang ◽

Juan Guo ◽

Wen Zeng ◽

Yujun Zhao ◽

...

Keyword(s):

Chloroplast Genome ◽

Single Copy ◽

Regulatory Elements ◽

Rrna Genes ◽

Expression Vectors ◽

Protein Coding ◽

Protein Coding Genes ◽

Flanking Sequences ◽

Erigeron Breviscapus ◽

Cp Genome

Erigeron breviscapus is a famous medicinal plant. However, the limited chloroplast genome information of E. breviscapus, especially for the chloroplast DNA sequence resources, has hindered the study of E. breviscapus chloroplast genome transformation. Here, the complete chloroplast (cp) genome of E. breviscapus was reported. This genome was 152,164bp in length, included 37.2% GC content and was structurally arranged into two 24,699bp inverted repeats (IRs) and two single-copy areas. The sizes of the large single-copy region and the small single-copy region were 84,657 and 18,109bp, respectively. The E. breviscapus cp genome consisted of 127 coding genes, including 83 protein coding genes, 36 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. For those genes, 95 genes were single copy genes and 16 genes were duplicated in two inverted regions with seven tRNAs, four rRNAs, and five protein coding genes. Then, genomic DNA of E. breviscapus was used as a template, and the endogenous 5' and 3' flanking sequences of the trnI gene and trnA gene were selected as homologous recombinant fragments in vector construction and cloned through PCR. The endogenous 5' flanking sequences of the psbA gene and rrn16S gene, the endogenous 3' flanking sequences of the psbA gene, rbcL gene, and rps16 gene and one sequence element from the psbN-psbH chloroplast operon were cloned, and certain chloroplast regulatory elements were identified. Two homologous recombination fragments and all of these elements were constructed into the cloning vector pBluescript SK (+) to yield a series of chloroplast expression vectors, which harbored the reporter gene EGFP and the selectable marker aadA gene. After identification, the chloroplast expression vectors were transformed into Escherichia coli and the function of predicted regulatory elements was confirmed by a spectinomycin resistance test and fluorescence intensity measurement. The results indicated that aadA gene and EGFP gene were efficiently expressed under the regulation of predicted regulatory elements and the chloroplast expression vector had been successfully constructed, thereby providing a solid foundation for establishing subsequent E. breviscapus chloroplast transformation system and genetic improvement of E. breviscapus.

Download Full-text

A standardized, extensible framework for optimizing classification improves marker-gene taxonomic assignments

10.7287/peerj.preprints.934v1 ◽

2015 ◽

Cited By ~ 1

Author(s):

Nicholas A Bokulich ◽

Jai Ram Rideout ◽

Evguenia Kopylova ◽

Evan Bolyen ◽

Jessica Patnode ◽

...

Keyword(s):

Marker Gene ◽

Amplicon Sequencing ◽

Evaluation Framework ◽

Taxonomic Resolution ◽

Marker Genes ◽

Classification Methods ◽

New Methods ◽

Taxonomic Assignments ◽

Different Levels

Background: Taxonomic classification of marker-gene (i.e., amplicon) sequences represents an important step for molecular identification of microorganisms. Results: We present three advances in our ability to assign and interpret taxonomic classifications of short marker gene sequences: two new methods for taxonomy assignment, which reduce runtime up to two-fold and achieve high precision genus-level assignments; an evaluation of classification methods that highlights differences in performance with different marker genes and at different levels of taxonomic resolution; and an extensible framework for evaluating and optimizing new classification methods, which we hope will serve as a model for standardized and reproducible bioinformatics methods evaluations. Conclusions: Our new methods are accessible in QIIME 1.9.0, and our evaluation framework will support ongoing optimization of classification methods to complement rapidly evolving short-amplicon sequencing and bioinformatics technologies. Static versions of all of the analysis notebooks generated with this framework, which contain all code and analysis results, can be viewed at http://bit.ly/srta-010.

Download Full-text