Why we missed it? Computational analysis reveals distribution patterns of of Malassezia furfur, the etiological agent of Pityriasis versicolor, in skin metagenomes

Abstract Malassezia furfur is the main causative species in pityriasis versicolor infections and is now widely believed to be a part of the skin microbiota, yet it was systematically missed in the early skin metagenomic studies. Here, we curated a specific set of M. furfur sequences and used them to reanalyze publicly available skin metagenomes to computationally investigate the distribution of M. furfur and its relative abundance at different skin sites. To this end, we used BLASTN to match and align these marker genes to the selected metagenomic datasets and estimated M. furfur relative abundance as the number of BLASTN hits per million metagenomic reads. We found a relative enrichment of M. furfur in the retro auricular crease, the antecubital fossa, and the forehead. Among skin categories, sebaceous areas were the most significantly enriched in M. furfur, while in terms of exposure/occlusion, exposed areas had the highest abundance. This work will facilitate and allow the estimation and correction of past estimates of this important fungal species in shot gun metagenomic data sets.

Download Full-text

Mega- and meta-analyses of fecal metagenomic studies assessing response to immune checkpoint inhibitors

10.1101/2021.04.27.441693 ◽

2021 ◽

Author(s):

Alya Heirali ◽

Bo Chen ◽

Matthew Wong ◽

Pierre HH Schneeberger ◽

Victor Rey ◽

...

Keyword(s):

Relative Abundance ◽

Immune Checkpoint ◽

Immune Checkpoint Inhibitors ◽

Checkpoint Inhibitors ◽

Bacterial Species ◽

Metagenomic Data ◽

Data Sets ◽

Meta Analyses ◽

Clostridium Bolteae ◽

Modelling Approach

AbstractPurposeGut microbiota have been associated with response to immune checkpoint inhibitors (ICI) including anti-PD-1 and anti-CTLA-4 antibodies. However, inter-study difference in design, patient cohorts and data analysis pose challenges to identifying species consistently associated with response to ICI or lack thereof.Experimental DesignWe uniformly processed and analyzed data from three studies of microbial metagenomes in cancer immunotherapy response (four distinct data sets) to identify species consistently associated with response or non-response (n=190 patient samples). Metagenomic data were processed and analyzed using Metaphlan v2.0. Meta- and mega-analyses were performed using a two-part modelling approach of species present in at least 20% of samples to account for both prevalence and relative abundance differences between responders/non-responders.ResultsMeta- and mega-analyses identified five species that were concordantly significantly different between responders and non-responders. Amongst them, Bacteroides thetaiotaomicron and Clostridium bolteae relative abundance (RA) were independently predictive of non-response to immunotherapy when data sets were combined and analyzed using mega-analyses (AUC 0.59 95% CI 0.51-0.68 and AUC 0.61 95% CI 0.52-0.69, respectively).ConclusionsMeta- and mega-analysis of published metagenomic studies identified bacterial species both positively and negatively associated with immunotherapy responsiveness across four published cohorts.

Download Full-text

Global Distribution Patterns and Pangenomic Diversity of the Candidate Phylum “Latescibacteria” (WS3)

Applied and Environmental Microbiology ◽

10.1128/aem.00521-17 ◽

2017 ◽

Vol 83 (10) ◽

Cited By ~ 35

Author(s):

Ibrahim F. Farag ◽

Noha H. Youssef ◽

Mostafa S. Elshahed

Keyword(s):

16S Rrna ◽

Sampling Site ◽

Distribution Patterns ◽

Global Distribution ◽

Metagenomic Data ◽

Rrna Gene ◽

Data Sets ◽

Aquatic Habitats ◽

Content Type ◽

Wide Range

ABSTRACT We investigated the global distribution patterns and pangenomic diversity of the candidate phylum “Latescibacteria” (WS3) in 16S rRNA gene as well as metagenomic data sets. We document distinct distribution patterns for various “Latescibacteria” orders in 16S rRNA gene data sets, with prevalence of orders sediment_1 in terrestrial, PBSIII_9 in groundwater and temperate freshwater, and GN03 in pelagic marine, saline-hypersaline, and wastewater habitats. Using a fragment recruitment approach, we identified 68.9 Mb of “Latescibacteria”-affiliated contigs in publicly available metagenomic data sets comprising 73,079 proteins. Metabolic reconstruction suggests a prevalent saprophytic lifestyle in all “Latescibacteria” orders, with marked capacities for the degradation of proteins, lipids, and polysaccharides predominant in plant, bacterial, fungal/crustacean, and eukaryotic algal cell walls. As well, extensive transport and central metabolic pathways for the metabolism of imported monomers were identified. Interestingly, genes and domains suggestive of the production of a cellulosome—e.g., protein-coding genes harboring dockerin I domains attached to a glycosyl hydrolase and scaffoldin-encoding genes harboring cohesin I and CBM37 domains—were identified in order PBSIII_9, GN03, and MSB-4E2 fragments recovered from four anoxic aquatic habitats; hence extending the cellulosomal production capabilities in Bacteria beyond the Gram-positive Firmicutes. In addition to fermentative pathways, a complete electron transport chain with terminal cytochrome c oxidases Caa3 (for operation under high oxygen tension) and Cbb3 (for operation under low oxygen tension) were identified in PBSIII_9 and GN03 fragments recovered from oxygenated and partially/seasonally oxygenated aquatic habitats. Our metagenomic recruitment effort hence represents a comprehensive pangenomic view of this yet-uncultured phylum and provides insights broader than and complementary to those gained from genome recovery initiatives focusing on a single or few sampled environments. IMPORTANCE Our understanding of the phylogenetic diversity, metabolic capabilities, and ecological roles of yet-uncultured microorganisms is rapidly expanding. However, recent efforts mainly have been focused on recovering genomes of novel microbial lineages from a specific sampling site, rather than from a wide range of environmental habitats. To comprehensively evaluate the genomic landscape, putative metabolic capabilities, and ecological roles of yet-uncultured candidate phyla, efforts that focus on the recovery of genomic fragments from a wide range of habitats and that adequately sample the intraphylum diversity within a specific target lineage are needed. Here, we investigated the global distribution patterns and pangenomic diversity of the candidate phylum “Latescibacteria.” Our results document the preference of specific “Latescibacteria” orders to specific habitats, the prevalence of plant polysaccharide degradation abilities within all “Latescibacteria” orders, the occurrence of all genes/domains necessary for the production of cellulosomes within three “Latescibacteria” orders (GN03, PBSIII_9, and MSB-4E2) in data sets recovered from anaerobic locations, and the identification of the components of an aerobic respiratory chain, as well as occurrence of multiple O2-dependent metabolic reactions in “Latescibacteria” orders GN03 and PBSIII_9 recovered from oxygenated habitats. The results demonstrate the value of phylocentric pangenomic surveys for understanding the global ecological distribution and panmetabolic abilities of yet-uncultured microbial lineages since they provide broader and more complementary insights than those gained from single-cell genomic and/or metagenomic-enabled genome recovery efforts focusing on a single sampling site.

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

Anti-Malassezia furfur activity of essential oils against causal agent of Pityriasis versicolor disease

African Journal of Pharmacy and Pharmacology ◽

10.5897/ajpp12.097 ◽

2012 ◽

Vol 6 (13) ◽

Author(s):

Richa Sharma

Keyword(s):

Essential Oils ◽

Pityriasis Versicolor ◽

Causal Agent ◽

Malassezia Furfur

Download Full-text

Screening Metagenomic Data for Viruses Using the E-Probe Diagnostic Nucleic Acid Assay

Phytopathology ◽

10.1094/phyto-11-13-0310-r ◽

2014 ◽

Vol 104 (10) ◽

pp. 1125-1129 ◽

Cited By ~ 11

Author(s):

A. H. Stobbe ◽

W. L. Schneider ◽

P. R. Hoyt ◽

U. Melcher

Keyword(s):

Nucleic Acid ◽

Mosaic Virus ◽

Yellow Mosaic Virus ◽

Metagenomic Data ◽

Data Sets ◽

Virus Species ◽

Data Set ◽

Golden Yellow ◽

Nucleic Acid Assay ◽

Ngs Data

Next generation sequencing (NGS) is not used commonly in diagnostics, in part due to the large amount of time and computational power needed to identify the taxonomic origin of each sequence in a NGS data set. By using the unassembled NGS data sets as the target for searches, pathogen-specific sequences, termed e-probes, could be used as queries to enable detection of specific viruses or organisms in plant sample metagenomes. This method, designated e-probe diagnostic nucleic acid assay, first tested with mock sequence databases, was tested with NGS data sets generated from plants infected with a DNA (Bean golden yellow mosaic virus, BGYMV) or an RNA (Plum pox virus, PPV) virus. In addition, the ability to detect and differentiate among strains of a single virus species, PPV, was examined by using probe sets that were specific to strains. The use of probe sets for multiple viruses determined that one sample was dually infected with BGYMV and Bean golden mosaic virus.

Download Full-text

MetaDEGalaxy: Galaxy workflow for differential abundance analysis of 16s metagenomic data

F1000Research ◽

10.12688/f1000research.18866.2 ◽

2019 ◽

Vol 8 ◽

pp. 726

Author(s):

Mike W.C. Thang ◽

Xin-Yi Chua ◽

Gareth Price ◽

Dominique Gorse ◽

Matt A. Field

Keyword(s):

Microbial Communities ◽

Sequence Data ◽

Metagenomic Data ◽

Marker Genes ◽

Metagenomic Sequencing ◽

Differential Analysis ◽

Biomedical Sciences ◽

Metagenomic Sequence ◽

Differential Abundance ◽

Differential Abundance Analysis

Metagenomic sequencing is an increasingly common tool in environmental and biomedical sciences. While software for detailing the composition of microbial communities using 16S rRNA marker genes is relatively mature, increasingly researchers are interested in identifying changes exhibited within microbial communities under differing environmental conditions. In order to gain maximum value from metagenomic sequence data we must improve the existing analysis environment by providing accessible and scalable computational workflows able to generate reproducible results. Here we describe a complete end-to-end open-source metagenomics workflow running within Galaxy for 16S differential abundance analysis. The workflow accepts 454 or Illumina sequence data (either overlapping or non-overlapping paired end reads) and outputs lists of the operational taxonomic unit (OTUs) exhibiting the greatest change under differing conditions. A range of analysis steps and graphing options are available giving users a high-level of control over their data and analyses. Additionally, users are able to input complex sample-specific metadata information which can be incorporated into differential analysis and used for grouping / colouring within graphs. Detailed tutorials containing sample data and existing workflows are available for three different input types: overlapping and non-overlapping read pairs as well as for pre-generated Biological Observation Matrix (BIOM) files. Using the Galaxy platform we developed MetaDEGalaxy, a complete metagenomics differential abundance analysis workflow. MetaDEGalaxy is designed for bench scientists working with 16S data who are interested in comparative metagenomics. MetaDEGalaxy builds on momentum within the wider Galaxy metagenomics community with the hope that more tools will be added as existing methods mature.

Download Full-text

Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level

mSystems ◽

10.1128/msystems.00943-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Gongchao Jing ◽

Lu Liu ◽

Zengbin Wang ◽

Yufeng Zhang ◽

Li Qian ◽

...

Keyword(s):

Big Data ◽

User Interface ◽

Search Engine ◽

Functional Similarity ◽

Metagenomic Data ◽

Data Sets ◽

Data Space ◽

Link Type ◽

Database Platform ◽

Microbiome Data

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.

Download Full-text

Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data

10.1101/340653 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arghavan Bahadorinejad ◽

Ivan Ivanov ◽

Johanna W Lampe ◽

Meredith AJ Hullar ◽

Robert S Chapkin ◽

...

Keyword(s):

16S Rrna ◽

Sample Size ◽

Microbial Communities ◽

State Of The Art ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Sample Data

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.

Download Full-text

METAnnotatorX2: a Comprehensive Tool for Deep and Shallow Metagenomic Data Set Analyses

mSystems ◽

10.1128/msystems.00583-21 ◽

2021 ◽

Vol 6 (3) ◽

Author(s):

Christian Milani ◽

Gabriele Andrea Lugli ◽

Federico Fontana ◽

Leonardo Mancabelli ◽

Giulia Alessandri ◽

...

Keyword(s):

Sensitivity And Specificity ◽

Metagenomic Data ◽

Data Sets ◽

Data Set

We developed a novel tool, i.e., METAnnotatorX2, that includes a number of new advanced features for analysis of deep and shallow metagenomic data sets and is accompanied by (regularly updated) customized databases for archaea, bacteria, fungi, protists, and viruses. Both software and databases were developed so as to maximize sensitivity and specificity while including support for shallow metagenomic data sets.

Download Full-text

Mumame: a software tool for quantifying gene-specific point-mutations in shotgun metagenomic data

Metabarcoding and Metagenomics ◽

10.3897/mbmg.3.36236 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 1

Author(s):

Shruthi Magesh ◽

Viktor Jonsson ◽

Johan Bengtsson-Palme

Keyword(s):

Microbial Communities ◽

Point Mutations ◽

Software Tool ◽

Metagenomic Data ◽

Data Sets ◽

Resistance Mutations ◽

Shotgun Metagenomics ◽

Key Factor ◽

Detection Of Mutations ◽

And Function

Metagenomics has emerged as a central technique for studying the structure and function of microbial communities. Often the functional analysis is restricted to classification into broad functional categories. However, important phenotypic differences, such as resistance to antibiotics, are often the result of just one or a few point mutations in otherwise identical sequences. Bioinformatic methods for metagenomic analysis have generally been poor at accounting for this fact, resulting in a somewhat limited picture of important aspects of microbial communities. Here, we address this problem by providing a software tool called Mumame, which can distinguish between wildtype and mutated sequences in shotgun metagenomic data and quantify their relative abundances. We demonstrate the utility of the tool by quantifying antibiotic resistance mutations in several publicly available metagenomic data sets. We also identified that sequencing depth is a key factor to detect rare mutations. Therefore, much larger numbers of sequences may be required for reliable detection of mutations than for most other applications of shotgun metagenomics. Mumame is freely available online (http://microbiology.se/software/mumame).

Download Full-text