Ananke: Temporal clustering reveals ecological dynamics of microbial communities

Taxonomic markers such as the 16S ribosomal RNA gene are widely used in microbial community analysis. A common first step in marker-gene analysis is grouping genes into clusters to reduce data sets to a more manageable size and potentially mitigate the effects of sequencing error. Instead of clustering based on sequence identity, marker-gene data sets collected over time can be clustered based on temporal correlation to reveal ecologically meaningful associations. We present Ananke, a free and open-source algorithm and software package that clusters marker-gene data based on time-series profiles and provides interactive visualization of clusters. Ananke is able to cluster distinct temporal patterns from simulations of multiple ecological patterns, such as periodic seasonal dynamics and organism appearances/disappearances. We apply our algorithm to two longitudinal marker gene data sets: faecal communities from the human gut of an individual sampled over one year, and communities from a freshwater lake sampled over eleven years. Within the gut, the segregation of the bacterial community around a food-poisoning event was immediately clear. In the freshwater lake, we found that high sequence identity between marker genes does not guarantee similar temporal dynamics, and Ananke time-series clusters revealed patterns obscured by clustering based on sequence identity or taxonomy. Ananke is free and open-source software available at https://github.com/beiko-lab/ananke.

Download Full-text

Ananke: temporal clustering reveals ecological dynamics of microbial communities

PeerJ ◽

10.7717/peerj.3812 ◽

2017 ◽

Vol 5 ◽

pp. e3812 ◽

Cited By ~ 11

Author(s):

Michael W. Hall ◽

Robin R. Rohwer ◽

Jonathan Perrie ◽

Katherine D. McMahon ◽

Robert G. Beiko

Keyword(s):

Time Series ◽

Open Source ◽

Temporal Dynamics ◽

Marker Gene ◽

Freshwater Lake ◽

Sequencing Error ◽

Marker Genes ◽

Data Sets ◽

Sequence Identity ◽

Gene Data

Taxonomic markers such as the 16S ribosomal RNA gene are widely used in microbial community analysis. A common first step in marker-gene analysis is grouping genes into clusters to reduce data sets to a more manageable size and potentially mitigate the effects of sequencing error. Instead of clustering based on sequence identity, marker-gene data sets collected over time can be clustered based on temporal correlation to reveal ecologically meaningful associations. We present Ananke, a free and open-source algorithm and software package that complements existing sequence-identity-based clustering approaches by clustering marker-gene data based on time-series profiles and provides interactive visualization of clusters, including highlighting of internal OTU inconsistencies. Ananke is able to cluster distinct temporal patterns from simulations of multiple ecological patterns, such as periodic seasonal dynamics and organism appearances/disappearances. We apply our algorithm to two longitudinal marker gene data sets: faecal communities from the human gut of an individual sampled over one year, and communities from a freshwater lake sampled over eleven years. Within the gut, the segregation of the bacterial community around a food-poisoning event was immediately clear. In the freshwater lake, we found that high sequence identity between marker genes does not guarantee similar temporal dynamics, and Ananke time-series clusters revealed patterns obscured by clustering based on sequence identity or taxonomy. Ananke is free and open-source software available at https://github.com/beiko-lab/ananke.

Download Full-text

Ananke: Temporal clustering reveals ecological dynamics of microbial communities

10.7287/peerj.preprints.2879v1 ◽

2017 ◽

Cited By ~ 1

Author(s):

Michael W Hall ◽

Robin R Rohwer ◽

Jonathan Perrie ◽

Katherine D McMahon ◽

Robert G Beiko

Keyword(s):

Time Series ◽

Open Source ◽

Temporal Dynamics ◽

Marker Gene ◽

Freshwater Lake ◽

Sequencing Error ◽

Marker Genes ◽

Data Sets ◽

Sequence Identity ◽

Gene Data

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

Connections between freshwater carbon and nutrient cycles revealed through reconstructed population genomes

10.1101/365627 ◽

2018 ◽

Cited By ~ 1

Author(s):

Alexandra M. Linz ◽

Shaomei He ◽

Sarah L. R. Stevens ◽

Karthik Anantharaman ◽

Robin R. Rohwer ◽

...

Keyword(s):

Time Series ◽

Nitrogen Fixation ◽

Nutrient Cycling ◽

Marker Gene ◽

Functional Marker ◽

Glycoside Hydrolases ◽

Marker Genes ◽

Nutrient Cycles ◽

Lake Mendota ◽

Genes Encoding

AbstractMetabolic processes at the microbial scale influence ecosystem functions because microbes are responsible for much of the carbon and nutrient cycling in freshwater. One approach to predict the metabolic capabilities of microbial communities is to search for functional marker genes in metagenomes. However, this approach does not provide context about co-occurrence with other metabolic traits within an organism or detailed taxonomy about those organisms. Here, we combine a functional marker gene analysis with metabolic pathway prediction of microbial population genomes (MAGs) assembled from metagenomic time series in eutrophic Lake Mendota and humic Trout Bog to identify how carbon and nutrient cycles are connected in freshwater. We found that phototrophy, carbon fixation, and nitrogen fixation pathways co-occurred in Cyanobacteria MAGs in Lake Mendota and in Chlorobiales MAGs in Trout Bog. Cyanobacteria MAGs also had strong temporal correlations to functional marker genes for nitrogen fixation in several years. Genes encoding steps in the nitrogen and sulfur cycles varied in abundance and taxonomy by lake, potentially reflecting the availability and composition of inorganic nutrients in these systems. We were also able to identify which populations contained the greatest density and diversity of genes encoding glycoside hydrolases. Populations with many glycoside hydrolases also encoded pathways for sugar degradation. By using both MAGs and marker genes, we were better able to link functions to specific taxonomic groups in our metagenomic time series, enabling a more detailed understanding of freshwater microbial carbon and nutrient cycling.

Download Full-text

High-resolution behavioral time series of Japanese quail within their social environment

Scientific Data ◽

10.1038/s41597-019-0299-8 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Jorge Martín Caliva ◽

Rocio Soledad Alcala ◽

Diego Alberto Guzmán ◽

Raúl Héctor Marin ◽

Jackelyn Melissa Kembro

Keyword(s):

Time Series ◽

High Resolution ◽

Japanese Quail ◽

Temporal Dynamics ◽

Original Video ◽

Data Sets ◽

Behavioral Tests ◽

Individual Level ◽

Future Data ◽

Precise Quantification

AbstractThe behavioral dynamics within a social group not only could depend on individual traits and social-experience of each member, but more importantly, emerges from inter-individual interactions over time. Herein, we first present a dataset, as well as the corresponding original video recordings, of the results of 4 behavioral tests associated with fear and aggressive response performed on 106 Japanese quail. In a second stage, birds were housed with conspecifics that performed similarly in the behavioral tests in groups of 2 females and 1 male. By continuously monitoring each bird in these small social groups, we obtained time series of social and reproductive behavior, and high-resolution locomotor time series. This approach provides the opportunity to perform precise quantification of the temporal dynamics of behavior at an individual level within different social scenarios including when an individual showing continued aggressive behaviors is present. These unique datasets and videos are publicly available in Figshare and can be used in further analysis, or for comparison with existing or future data sets or mathematical models across different taxa.

Download Full-text

Identification of marker genes in Alzheimer's disease using a machine-learning model

Bioinformation ◽

10.6026/97320630017363 ◽

2021 ◽

Vol 17 (2) ◽

pp. 363-368

Author(s):

Inamul Hasan Madar ◽

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Marker Gene ◽

The Elderly ◽

Tissue Expression ◽

Marker Genes ◽

Data Sets ◽

Machine Learning Classification ◽

Machine Learning Model

Alzheimer's Disease (AD) is one of the most common causes of dementia, mostly affecting the elderly population. Currently, there is no proper diagnostic tool or method available for the detection of AD. The present study used two distinct data sets of AD genes, which could be potential biomarkers in the diagnosis. The differentially expressed genes (DEGs) curated from both datasets were used for machine learning classification, tissue expression annotation and co-expression analysis. Further, CNPY3, GPR84, HIST1H2AB, HIST1H2AE, IFNAR1, LMO3, MYO18A, N4BP2L1, PML, SLC4A4, ST8SIA4, TLE1 and N4BP2L1 were identified as highly significant DEGs and exhibited co-expression with other query genes. Moreover, a tissue expression study found that these genes are also expressed in the brain tissue. In addition to the earlier studies for marker gene identification, we have considered a different set of machine learning classifiers to improve the accuracy rate from the analysis. Amongst all the six classification algorithms, J48 emerged as the best classifier, which could be used for differentiating healthy and diseased samples. SMO/SVM and Logit Boost further followed J48 to achieve the classification accuracy.

Download Full-text

Interpretation of the Chemical and Physical Time-Series Retrieved from Sentik Glacier, Ladakh Himalaya, India

Journal of Glaciology ◽

10.3189/s0022143000008509 ◽

1984 ◽

Vol 30 (104) ◽

pp. 66-76 ◽

Cited By ~ 2

Author(s):

Paul A. Mayewski ◽

W. Berry Lyons ◽

N. Ahmad ◽

Gordon Smith ◽

M. Pourchet

Keyword(s):

Time Series ◽

Chemical Species ◽

Data Sets ◽

Reactive Iron ◽

Physical Time ◽

Ladakh Himalaya ◽

Data Density ◽

Mass Circulation ◽

The Himalaya ◽

Analysis Of Time Series

AbstractSpectral analysis of time series of a c. 17 ± 0.3 year core, calibrated for total ß activity recovered from Sentik Glacier (4908m) Ladakh, Himalaya, yields several recognizable periodicities including subannual, annual, and multi-annual. The time-series, include both chemical data (chloride, sodium, reactive iron, reactive silicate, reactive phosphate, ammonium, δD, δ(18O) and pH) and physical data (density, debris and ice-band locations, and microparticles in size grades 0.50 to 12.70 μm). Source areas for chemical species investigated and general air-mass circulation defined from chemical and physical time-series are discussed to demonstrate the potential of such studies in the development of paleometeorological data sets from remote high-alpine glacierized sites such as the Himalaya.

Download Full-text

The mutL Gene as a Genome-Wide Taxonomic Marker for High Resolution Discrimination of Lactiplantibacillus plantarum and Its Closely Related Taxa

Microorganisms ◽

10.3390/microorganisms9081570 ◽

2021 ◽

Vol 9 (8) ◽

pp. 1570

Author(s):

Chien-Hsun Huang ◽

Chih-Chieh Chen ◽

Yu-Chun Lin ◽

Chia-Hsuan Chen ◽

Ai-Yun Lee ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Target Genes ◽

Marker Genes ◽

Rrna Gene ◽

Accurate Identification ◽

Discrimination Power ◽

Sequence Identity ◽

Genome Wide ◽

A Genome

The current taxonomy of the Lactiplantibacillus plantarum group comprises of 17 closely related species that are indistinguishable from each other by using commonly used 16S rRNA gene sequencing. In this study, a whole-genome-based analysis was carried out for exploring the highly distinguished target genes whose interspecific sequence identity is significantly less than those of 16S rRNA or conventional housekeeping genes. In silico analyses of 774 core genes by the cano-wgMLST_BacCompare analytics platform indicated that csbB, morA, murI, mutL, ntpJ, rutB, trmK, ydaF, and yhhX genes were the most promising candidates. Subsequently, the mutL gene was selected, and the discrimination power was further evaluated using Sanger sequencing. Among the type strains, mutL exhibited a clearly superior sequence identity (61.6–85.6%; average: 66.6%) to the 16S rRNA gene (96.7–100%; average: 98.4%) and the conventional phylogenetic marker genes (e.g., dnaJ, dnaK, pheS, recA, and rpoA), respectively, which could be used to separat tested strains into various species clusters. Consequently, species-specific primers were developed for fast and accurate identification of L. pentosus, L. argentoratensis, L. plantarum, and L. paraplantarum. During this study, one strain (BCRC 06B0048, L. pentosus) exhibited not only relatively low mutL sequence identities (97.0%) but also a low digital DNA–DNA hybridization value (78.1%) with the type strain DSM 20314T, signifying that it exhibits potential for reclassification as a novel subspecies. Our data demonstrate that mutL can be a genome-wide target for identifying and classifying the L. plantarum group species and for differentiating novel taxa from known species.

Download Full-text

Metabolic pathways inferred from a bacterial marker gene illuminate ecological changes across South Pacific frontal boundaries

Nature Communications ◽

10.1038/s41467-021-22409-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Eric J. Raes ◽

Kristen Karsh ◽

Swan L. S. Sow ◽

Martin Ostrowski ◽

Mark V. Brown ◽

...

Keyword(s):

16S Rrna ◽

Metabolic Pathways ◽

Low Cost ◽

Marker Gene ◽

South Pacific ◽

Rrna Gene ◽

South Pacific Ocean ◽

Bacterial Marker ◽

Gene 16S Rrna ◽

Gene Data

AbstractGlobal oceanographic monitoring initiatives originally measured abiotic essential ocean variables but are currently incorporating biological and metagenomic sampling programs. There is, however, a large knowledge gap on how to infer bacterial functions, the information sought by biogeochemists, ecologists, and modelers, from the bacterial taxonomic information (produced by bacterial marker gene surveys). Here, we provide a correlative understanding of how a bacterial marker gene (16S rRNA) can be used to infer latitudinal trends for metabolic pathways in global monitoring campaigns. From a transect spanning 7000 km in the South Pacific Ocean we infer ten metabolic pathways from 16S rRNA gene sequences and 11 corresponding metagenome samples, which relate to metabolic processes of primary productivity, temperature-regulated thermodynamic effects, coping strategies for nutrient limitation, energy metabolism, and organic matter degradation. This study demonstrates that low-cost, high-throughput bacterial marker gene data, can be used to infer shifts in the metabolic strategies at the community scale.

Download Full-text

Discriminating between JCPyV and BKPyV in Urinary Virome Data Sets

Viruses ◽

10.3390/v13061041 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1041

Author(s):

Rita Mormando ◽

Alan J. Wolfe ◽

Catherine Putonti

Keyword(s):

Jc Virus ◽

Sequence Similarity ◽

Bk Virus ◽

Data Sets ◽

Metagenomic Sequencing ◽

Significant Sequence Similarity ◽

Sequence Identity ◽

Shotgun Metagenomic Sequencing ◽

Urinary Microbiome ◽

Six Genes

Polyomaviruses are abundant in the human body. The polyomaviruses JC virus (JCPyV) and BK virus (BKPyV) are common viruses in the human urinary tract. Prior studies have estimated that JCPyV infects between 20 and 80% of adults and that BKPyV infects between 65 and 90% of individuals by age 10. However, these two viruses encode for the same six genes and share 75% nucleotide sequence identity across their genomes. While prior urinary virome studies have repeatedly reported the presence of JCPyV, we were interested in seeing how JCPyV prevalence compares to BKPyV. We retrieved all publicly available shotgun metagenomic sequencing reads from urinary microbiome and virome studies (n = 165). While one third of the data sets produced hits to JCPyV, upon further investigation were we able to determine that the majority of these were in fact BKPyV. This distinction was made by specifically mining for JCPyV and BKPyV and considering uniform coverage across the genome. This approach provides confidence in taxon calls, even between closely related viruses with significant sequence similarity.

Download Full-text