scholarly journals Amplicon sequencing of single-copy protein-coding genes reveals accurate diversity for sequence-discrete microbiome populations

2021 ◽  
Author(s):  
Chengfeng Yang ◽  
Qinzhi Su ◽  
Min Tang ◽  
Shiqi Luo ◽  
Hao Zheng ◽  
...  

An in-depth understanding of microbial function and the division of ecological niches requires accurate delineation and identification of microbes at a fine taxonomic resolution. Microbial phylotypes are typically defined using a 97% small subunit (16S) rRNA threshold. However, increasing evidence has demonstrated the ubiquitous presence of taxonomic units of distinct functions within phylotypes. These so-called sequence-discrete populations (SDPs) have used to be mainly delineated by disjunct sequence similarity at the whole-genome level. However, gene markers that could accurately identify and quantify SDPs are lacking in microbial community studies. Here we developed a pipeline to screen single-copy protein-coding genes that could accurately characterize SDP diversity via amplicon sequencing of microbial communities. Fifteen candidate marker genes were evaluated using three criteria (extent of sequence divergence, phylogenetic accuracy, and conservation of primer regions) and the selected genes were subject to test the efficiency in differentiating SDPs within Gilliamella, a core honeybee gut microbial phylotype, as a proof-of-concept. The results showed that the 16S V4 region failed to report accurate SDP diversities due to low taxonomic resolution and changing copy numbers. In contrast, the single-copy genes recommended by our pipeline were able to successfully quantify Gilliamella SDPs for both mock samples and honeybee guts, with results highly consistent with those of metagenomics. The pipeline developed in this study is expected to identify single-copy protein coding genes capable of accurately quantifying diverse bacterial communities at the SDP level.

2021 ◽  
Vol 10 (16) ◽  
Author(s):  
Zhenhua Yu ◽  
Sergio de los Santos-Villalobos ◽  
Yansheng Li ◽  
Jian Jin ◽  
Fannie Isela Parra Cota ◽  
...  

ABSTRACT Here, we present the draft genome of Bacillus sp. strain IGA-FME-2. This strain was isolated from the bulk soil of soybean (Glycine max L.). Its genome consists of 3,810 protein-coding genes, 44 tRNAs, two 16S rRNAs, and a single copy of 23S rRNA, with a GC content of 46.4%.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8450 ◽  
Author(s):  
Sunan Huang ◽  
Xuejun Ge ◽  
Asunción Cano ◽  
Betty Gaby Millán Salazar ◽  
Yunfei Deng

The genus Dicliptera (Justicieae, Acanthaceae) consists of approximately 150 species distributed throughout the tropical and subtropical regions of the world. Newly obtained chloroplast genomes (cp genomes) are reported for five species of Dilciptera (D. acuminata, D. peruviana, D. montana, D. ruiziana and D. mucronata) in this study. These cp genomes have circular structures of 150,689–150,811 bp and exhibit quadripartite organizations made up of a large single copy region (LSC, 82,796–82,919 bp), a small single copy region (SSC, 17,084–17,092 bp), and a pair of inverted repeat regions (IRs, 25,401–25,408 bp). Guanine-Cytosine (GC) content makes up 37.9%–38.0% of the total content. The complete cp genomes contain 114 unique genes, including 80 protein-coding genes, 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes. Comparative analyses of nucleotide variability (Pi) reveal the five most variable regions (trnY-GUA-trnE-UUC, trnG-GCC, psbZ-trnG-GCC, petN-psbM, and rps4-trnL-UUA), which may be used as molecular markers in future taxonomic identification and phylogenetic analyses of Dicliptera. A total of 55-58 simple sequence repeats (SSRs) and 229 long repeats were identified in the cp genomes of the five Dicliptera species. Phylogenetic analysis identified a close relationship between D. ruiziana and D. montana, followed by D. acuminata, D. peruviana, and D. mucronata. Evolutionary analysis of orthologous protein-coding genes within the family Acanthaceae revealed only one gene, ycf15, to be under positive selection, which may contribute to future studies of its adaptive evolution. The completed genomes are useful for future research on species identification, phylogenetic relationships, and the adaptive evolution of the Dicliptera species.


2021 ◽  
Vol 46 (1) ◽  
pp. 162-174
Author(s):  
Ming-Hui Yan ◽  
Chun-Yang Li ◽  
Peter W. Fritsch ◽  
Jie Cai ◽  
Heng-Chang Wang

Abstract—The phylogenetic relationships among 11 out of the 12 genera of the angiosperm family Styracaceae have been largely resolved with DNA sequence data based on all protein-coding genes of the plastome. The only genus that has not been phylogenomically investigated in the family with molecular data is the monotypic genus Parastyrax, which is extremely rare in the wild and difficult to collect. To complete the sampling of the genera comprising the Styracaceae, examine the plastome composition of Parastyrax, and further explore the phylogenetic relationships of the entire family, we sequenced the whole plastome of P. lacei and incorporated it into the Styracaceae dataset for phylogenetic analysis. Similar to most others in the family, the plastome is 158189 bp in length and contains a large single-copy region of 88085 bp and a small single-copy region of 18540 bp separated by two inverted-repeat regions of 25781 bp each. A total of 113 genes was predicted, including 79 protein-coding genes, 30 tRNA genes, and four rRNA genes. Phylogenetic relationships among all 12 genera of the family were constructed with 79 protein-coding genes. Consistent with a previous study, Styrax, Huodendron, and a clade of Alniphyllum + Bruinsmia were successively sister to the remainder of the family. Parastyrax was strongly supported as sister to an internal clade comprising seven other genera of the family, whereas Halesia and Pterostyrax were both recovered as polyphyletic, as in prior studies. However, when we employed either the whole plastome or the large- or small-single copy regions as datasets, Pterostyrax was resolved as monophyletic with 100% support, consistent with expectations based on morphology and indicating that non-coding regions of the Styracaceae plastome contain informative phylogenetic signal. Conversely Halesia was still resolved as polyphyletic but with novel strong support.


2020 ◽  
Vol 9 (30) ◽  
Author(s):  
Dhruba Bhattacharya ◽  
Sergio de los Santos Villalobos ◽  
Valeria Valenzuela Ruiz ◽  
Joseph Selvin ◽  
Joydeep Mukherjee

ABSTRACT The draft genome of Bacillus sp. SPB7, which was isolated from the marine sponge Spongia officinalis, is presented. This bacterium is a producer of an antimicrobial cyclic diketopiperazine, (3S,6S)-3,6-diisobutylpiperazine-2,5-dione. The genome consists of 4,511 protein-coding genes, 63 tRNAs, 2 16S rRNAs, 3 23S rRNAs, and a single copy of 5S rRNA.


Plants ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1354
Author(s):  
Slimane Khayi ◽  
Fatima Gaboun ◽  
Stacy Pirro ◽  
Tatiana Tatusova ◽  
Abdelhamid El Mousadik ◽  
...  

Argania spinosa (Sapotaceae), an important endemic Moroccan oil tree, is a primary source of argan oil, which has numerous dietary and medicinal proprieties. The plant species occupies the mid-western part of Morocco and provides great environmental and socioeconomic benefits. The complete chloroplast (cp) genome of A. spinosa was sequenced, assembled, and analyzed in comparison with those of two Sapotaceae members. The A. spinosa cp genome is 158,848 bp long, with an average GC content of 36.8%. The cp genome exhibits a typical quadripartite and circular structure consisting of a pair of inverted regions (IR) of 25,945 bp in length separating small single-copy (SSC) and large single-copy (LSC) regions of 18,591 and 88,367 bp, respectively. The annotation of A. spinosa cp genome predicted 130 genes, including 85 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, and 37 transfer RNA (tRNA) genes. A total of 44 long repeats and 88 simple sequence repeats (SSR) divided into mononucleotides (76), dinucleotides (7), trinucleotides (3), tetranucleotides (1), and hexanucleotides (1) were identified in the A. spinosa cp genome. Phylogenetic analyses using the maximum likelihood (ML) method were performed based on 69 protein-coding genes from 11 species of Ericales. The results confirmed the close position of A. spinosa to the Sideroxylon genus, supporting the revisiting of its taxonomic status. The complete chloroplast genome sequence will be valuable for further studies on the conservation and breeding of this medicinally and culinary important species and also contribute to clarifying the phylogenetic position of the species within Sapotaceae.


2019 ◽  
Vol 42 (4) ◽  
pp. 601-611 ◽  
Author(s):  
Yan Li ◽  
Liukun Jia ◽  
Zhihua Wang ◽  
Rui Xing ◽  
Xiaofeng Chi ◽  
...  

Abstract Saxifraga sinomontana J.-T. Pan & Gornall belongs to Saxifraga sect. Ciliatae subsect. Hirculoideae, a lineage containing ca. 110 species whose phylogenetic relationships are largely unresolved due to recent rapid radiations. Analyses of complete chloroplast genomes have the potential to significantly improve the resolution of phylogenetic relationships in this young plant lineage. The complete chloroplast genome of S. sinomontana was de novo sequenced, assembled and then compared with that of other six Saxifragaceae species. The S. sinomontana chloroplast genome is 147,240 bp in length with a typical quadripartite structure, including a large single-copy region of 79,310 bp and a small single-copy region of 16,874 bp separated by a pair of inverted repeats (IRs) of 25,528 bp each. The chloroplast genome contains 113 unique genes, including 79 protein-coding genes, four rRNAs and 30 tRNAs, with 18 duplicates in the IRs. The gene content and organization are similar to other Saxifragaceae chloroplast genomes. Sixty-one simple sequence repeats were identified in the S. sinomontana chloroplast genome, mostly represented by mononucleotide repeats of polyadenine or polythymine. Comparative analysis revealed 12 highly divergent regions in the intergenic spacers, as well as coding genes of matK, ndhK, accD, cemA, rpoA, rps19, ndhF, ccsA, ndhD and ycf1. Phylogenetic reconstruction of seven Saxifragaceae species based on 66 protein-coding genes received high bootstrap support values for nearly all identified nodes, suggesting a promising opportunity to resolve infrasectional relationships of the most species-rich section Ciliatae of Saxifraga.


1991 ◽  
Vol 280 (2) ◽  
pp. 439-444 ◽  
Author(s):  
K Ikejiri ◽  
T Wasada ◽  
K Haruki ◽  
N Hizuka ◽  
Y Hirata ◽  
...  

The human insulin-like growth factor-II (hIGF-II) gene has until now been thought to be composed of eight exons, including three independent leader exons. In the present study two additional exons, one leader exon and one alternatively used ordinate exon, have been newly identified. They were abundantly expressed in human histiocytoma tissue, generating mRNA species of about 5.0 kb in length. The new leader exon shows significant sequence similarity with the rE1 exon, previously reported to be transcribed only in the rat, and is mapped at nearly the same genomic location as in the rat. On the other hand, sequence similarity with another exon in the corresponding region of the rat genome was also found. It was, however, obvious that the rat sequence would not work as an active exon, since both splice acceptor and donor sites were deviated considerably from the consensus sequences. It has thus become apparent that the complex transcription unit of a single-copy hIGF-II gene comprises at least 10 exons, including four leader exons, one alternative exon and three common protein-coding exons.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yifan Yu ◽  
Zhen Ouyang ◽  
Juan Guo ◽  
Wen Zeng ◽  
Yujun Zhao ◽  
...  

Erigeron breviscapus is a famous medicinal plant. However, the limited chloroplast genome information of E. breviscapus, especially for the chloroplast DNA sequence resources, has hindered the study of E. breviscapus chloroplast genome transformation. Here, the complete chloroplast (cp) genome of E. breviscapus was reported. This genome was 152,164bp in length, included 37.2% GC content and was structurally arranged into two 24,699bp inverted repeats (IRs) and two single-copy areas. The sizes of the large single-copy region and the small single-copy region were 84,657 and 18,109bp, respectively. The E. breviscapus cp genome consisted of 127 coding genes, including 83 protein coding genes, 36 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. For those genes, 95 genes were single copy genes and 16 genes were duplicated in two inverted regions with seven tRNAs, four rRNAs, and five protein coding genes. Then, genomic DNA of E. breviscapus was used as a template, and the endogenous 5' and 3' flanking sequences of the trnI gene and trnA gene were selected as homologous recombinant fragments in vector construction and cloned through PCR. The endogenous 5' flanking sequences of the psbA gene and rrn16S gene, the endogenous 3' flanking sequences of the psbA gene, rbcL gene, and rps16 gene and one sequence element from the psbN-psbH chloroplast operon were cloned, and certain chloroplast regulatory elements were identified. Two homologous recombination fragments and all of these elements were constructed into the cloning vector pBluescript SK (+) to yield a series of chloroplast expression vectors, which harbored the reporter gene EGFP and the selectable marker aadA gene. After identification, the chloroplast expression vectors were transformed into Escherichia coli and the function of predicted regulatory elements was confirmed by a spectinomycin resistance test and fluorescence intensity measurement. The results indicated that aadA gene and EGFP gene were efficiently expressed under the regulation of predicted regulatory elements and the chloroplast expression vector had been successfully constructed, thereby providing a solid foundation for establishing subsequent E. breviscapus chloroplast transformation system and genetic improvement of E. breviscapus.


Author(s):  
Nicholas A Bokulich ◽  
Jai Ram Rideout ◽  
Evguenia Kopylova ◽  
Evan Bolyen ◽  
Jessica Patnode ◽  
...  

Background: Taxonomic classification of marker-gene (i.e., amplicon) sequences represents an important step for molecular identification of microorganisms. Results: We present three advances in our ability to assign and interpret taxonomic classifications of short marker gene sequences: two new methods for taxonomy assignment, which reduce runtime up to two-fold and achieve high precision genus-level assignments; an evaluation of classification methods that highlights differences in performance with different marker genes and at different levels of taxonomic resolution; and an extensible framework for evaluating and optimizing new classification methods, which we hope will serve as a model for standardized and reproducible bioinformatics methods evaluations. Conclusions: Our new methods are accessible in QIIME 1.9.0, and our evaluation framework will support ongoing optimization of classification methods to complement rapidly evolving short-amplicon sequencing and bioinformatics technologies. Static versions of all of the analysis notebooks generated with this framework, which contain all code and analysis results, can be viewed at http://bit.ly/srta-010.


Sign in / Sign up

Export Citation Format

Share Document