scholarly journals SINAPS: Prediction of microbial traits from marker gene sequences

2017 ◽  
Author(s):  
Robert C. Edgar

AbstractMicrobial communities are often studied by sequencing marker genes such as 16S ribosomal RNA. Marker gene sequences can be used to assess diversity and taxonomy, but do not directly measure functions arising from other genes in the community metagenome. Such functions can be predicted by algorithms that associate marker genes with experimentally determined traits in well-studied species. Typically, such methods use ancestral state reconstruction. Here I describe SINAPS, a new algorithm that predicts traits for marker gene sequences using a fast, simple word-counting algorithm that does not require alignments or trees. A measure of prediction confidence is obtained by bootstrapping. I tested SINAPS predictions from 16S V4 query sequences for traits including energy metabolism, Gram-positive staining, presence of a flagellum, V4 primer mismatches, and 16S copy number. Accuracy was >90% except for copy number, where a large majority of predictions were within +/−2 of the true value.

GigaScience ◽  
2020 ◽  
Vol 9 (3) ◽  
Author(s):  
Haris Zafeiropoulos ◽  
Ha Quoc Viet ◽  
Katerina Vasileiadou ◽  
Antonis Potirakis ◽  
Christos Arvanitidis ◽  
...  

Abstract Background Environmental DNA and metabarcoding allow the identification of a mixture of species and launch a new era in bio- and eco-assessment. Many steps are required to obtain taxonomically assigned matrices from raw data. For most of these, a plethora of tools are available; each tool's execution parameters need to be tailored to reflect each experiment's idiosyncrasy. Adding to this complexity, the computation capacity of high-performance computing systems is frequently required for such analyses. To address the difficulties, bioinformatic pipelines need to combine state-of-the art technologies and algorithms with an easy to get-set-use framework, allowing researchers to tune each study. Software containerization technologies ease the sharing and running of software packages across operating systems; thus, they strongly facilitate pipeline development and usage. Likewise programming languages specialized for big data pipelines incorporate features like roll-back checkpoints and on-demand partial pipeline execution. Findings PEMA is a containerized assembly of key metabarcoding analysis tools that requires low effort in setting up, running, and customizing to researchers’ needs. Based on third-party tools, PEMA performs read pre-processing, (molecular) operational taxonomic unit clustering, amplicon sequence variant inference, and taxonomy assignment for 16S and 18S ribosomal RNA, as well as ITS and COI marker gene data. Owing to its simplified parameterization and checkpoint support, PEMA allows users to explore alternative algorithms for specific steps of the pipeline without the need of a complete re-execution. PEMA was evaluated against both mock communities and previously published datasets and achieved results of comparable quality. Conclusions A high-performance computing–based approach was used to develop PEMA; however, it can be used in personal computers as well. PEMA's time-efficient performance and good results will allow it to be used for accurate environmental DNA metabarcoding analysis, thus enhancing the applicability of next-generation biodiversity assessment studies.


2021 ◽  
Vol 20 (7) ◽  
pp. 889-904
Author(s):  
M. Prieto ◽  
Javier Etayo ◽  
I. Olariaga

AbstractThe class Eurotiomycetes (Ascomycota, Pezizomycotina) comprises important fungi used for medical, agricultural, industrial and scientific purposes. Eurotiomycetes is a morphologically and ecologically diverse monophyletic group. Within the Eurotiomycetes, different ascoma morphologies are found including cleistothecia and perithecia but also apothecia or stromatic forms. Mazaediate representatives (with a distinct structure in which loose masses of ascospores accumulate to be passively disseminated) have evolved independently several times. Here we describe a new mazaediate species belonging to the Eurotiomycetes. The multigene phylogeny produced (7 gene regions: nuLSU, nuSSU, 5.8S nuITS, mtSSU, RPB1, RPB2 and MCM7) placed the new species in a lineage sister to Eurotiomycetidae. Based on the evolutionary relationships and morphology, a new subclass, a new order, family and genus are described to place the new species: Cryptocalicium blascoi. This calicioid species occurs on the inner side of loose bark strips of Cupressaceae (Cupressus, Juniperus). Morphologically, C. blascoi is characterized by having minute apothecioid stalked ascomata producing mazaedia, clavate bitunicate asci with hemiamyloid reaction, presence of hamathecium and an apothecial external surface with dark violet granules that becomes turquoise green in KOH. The ancestral state reconstruction analyses support a common ancestor with open ascomata for all deep nodes in Eurotiomycetes and the evolution of closed ascomata (cleistothecioid in Eurotiomycetidae and perithecioid in Chaetothyriomycetidae) from apothecioid ancestors. The appropriateness of the description of a new subclass for this fungus is also discussed.


2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.


GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Haris Zafeiropoulos ◽  
Ha Quoc Viet ◽  
Katerina Vasileiadou ◽  
Antonis Potirakis ◽  
Christos Arvanitidis ◽  
...  

Author(s):  
Bennett J Kapili ◽  
Anne E Dekas

Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online.


1996 ◽  
Vol 31 (1) ◽  
pp. 23-29 ◽  
Author(s):  
Gary W. Saunders ◽  
Isabelle M. Strachan ◽  
John A. West ◽  
Gerald T. Kraft

2002 ◽  
Vol 23 (2) ◽  
pp. 288-292 ◽  
Author(s):  
Martı́n Garcı́a-Varela ◽  
Michael P Cummings ◽  
Gerardo Pérez-Ponce de León ◽  
Scott L Gardner ◽  
Juan P Laclette

1985 ◽  
Vol 5 (9) ◽  
pp. 2265-2271
Author(s):  
S Chakrabarti ◽  
S Joffe ◽  
M M Seidman

Shuttle vector plasmids were constructed with directly repeated sequences flanking a marker gene. African green monkey kidney (AGMK) cells were infected with the constructions, and after a period of replication, the progeny plasmids were recovered and introduced into bacteria. Those colonies with plasmids that had lost the marker gene were identified, and the individual plasmids were purified and characterized by restriction enzyme digestion. Recombination between the repeated elements generated a plasmid with a precise deletion and a characteristic restriction pattern, which distinguished the recombined molecules from those with other defects in the marker gene. Recombination among the following different sequences was measured in this assay: (i) the simian virus 40 origin and enhancer region, (ii) the AGMK Alu sequence, and (iii) a sequence from plasmid pBR322. Similar frequencies of recombination among these sequences were found. Recombination occurred more frequently in Cos1 cells than in CV1 cells. In these experiments, the plasmid population with defective marker genes consisted of the recombined molecules and of the spontaneous deletion-insertion mutants described earlier. The frequency of the latter class was unaffected by the presence of the option for recombination represented by the direct repeats. Both recombination and deletion-insertion mutagenesis were stimulated by double-strand cleavage between the repeated sequences and adjacent to the marker, and the frequency of the deletion-insertion mutants in this experiment was again independent of the presence of the direct repeats. We concluded that although recombination and deletion-insertion mutagenesis were both stimulated by double-strand cleavage, the molecules which underwent the two types of change were drawn from separate pools.


Sign in / Sign up

Export Citation Format

Share Document