scholarly journals On the complexity of haplotyping a microbial community

Author(s):  
Samuel M Nicholls ◽  
Wayne Aubrey ◽  
Kurt De Grave ◽  
Leander Schietgat ◽  
Christopher J Creevey ◽  
...  

Abstract Motivation Population-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes), but for an unknown number of individuals and haplotypes. Results The problem of single individual haplotyping (SIH) was first formalised by Lancia et al. in 2001. Now, nearly two decades later, we discuss the complexity of “haplotyping” metagenomic samples, with a new formalisation of Lancia et al’s data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample, which we term the metagenomic individual haplotyping (MIH) problem. We also provide software implementations for a pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm. Availability and implementation Our reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) are open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, respectively.

2020 ◽  
Author(s):  
Samuel M. Nicholls ◽  
Wayne Aubrey ◽  
Kurt De Grave ◽  
Leander Schietgat ◽  
Christopher J. Creevey ◽  
...  

AbstractMotivationPopulation-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes), but for an unknown number of individuals.ResultsThe problem of single individual haplotyping (SIH) was first formalised by Lancia et al in 2001. Now, nearly two decades later, we discuss the complexity of “haplotyping” metagenomic samples, with a new formalisation of Lancia et al ‘s data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample: which we term the metagenomic individual haplotyping (MIH) problem. We also provide software implementations of our proposed pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm.Availability and implementationOur reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) are open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, [email protected]


BioScience ◽  
2020 ◽  
Vol 70 (7) ◽  
pp. 589-596 ◽  
Author(s):  
Laurence J Clarke ◽  
Penelope J Jones ◽  
Hans Ammitzboll ◽  
Leon A Barmuta ◽  
Martin F Breed ◽  
...  

Abstract Bacteria, fungi, and other microorganisms in the environment (i.e., environmental microbiomes) provide vital ecosystem services and affect human health. Despite their importance, public awareness of environmental microbiomes has lagged behind that of human microbiomes. A key problem has been a scarcity of research demonstrating the microbial connections across environmental biomes (e.g., marine, soil) and between environmental and human microbiomes. We show in the present article, through analyses of almost 10,000 microbiome papers and three global data sets, that there are significant taxonomic similarities in microbial communities across biomes, but very little cross-biome research exists. This disconnect may be hindering advances in microbiome knowledge and translation. In this article, we highlight current and potential applications of environmental microbiome research and the benefits of an interdisciplinary, cross-biome approach. Microbiome scientists need to engage with each other, government, industry, and the public to ensure that research and applications proceed ethically, maximizing the potential benefits to society.


1973 ◽  
Vol 10 (4) ◽  
pp. 739-747 ◽  
Author(s):  
P. J. Brockwell ◽  
W. H. Kuo

A supercritical age-dependent branching process is considered in which the lifespan of each individual is composed of four phases whose durations have joint probability density f(x1, x2, x3, x4). Starting with a single individual of age zero at time zero we consider the asymptotic behaviour as t→ ∞ of the random variable Z(4) (a0,…, an, t) defined as the number of individuals in phase 4 at time t for which the elapsed phase durations Y01,…, Y04,…, Yi1,…, Yi4,…, Yn4 of the individual itself and its first n ancestors satisfy the inequalities Yij ≦ aij, i = 0,…, n, j = 1,…, 4. The application of the results to the analysis of cell-labelling experiments is described. Finally we state an analogous result which defines (conditional on eventual non-extinction of the population) the asymptotic joint distribution of the phase and elapsed phase durations of an individual drawn at random from the population and the phase durations of its ancestors.


2020 ◽  
Author(s):  
Andrew Omame ◽  
Celestine Uchenna Nnanna ◽  
Simeon Chioma Inyama

In this work, a co-infection model for human papillomavirus (HPV) and Chlamydia trachomatis with cost-effectiveness optimal control analysis is developed and analyzed. The disease-free equilibrium of the co-infection model is \textbf{shown not to} be globally asymptotically stable, when the associated reproduction number is less unity. It is proven that the model undergoes the phenomenon of backward bifurcation when the associated reproduction number is less than unity. It is also shown that HPV re-infection ($\varepsilon\sst{p} \neq 0$) induced the phenomenon of backward bifurcation. Numerical simulations of the optimal control model showed that: (i) focusing on HPV intervention strategy alone (HPV prevention and screening), in the absence of Chlamydia trachomatis control, leads to a positive population level impact on the total number of individuals singly infected with Chlamydia trachomatis, (ii) Concentrating on Chlamydia trachomatis intervention controls alone (Chlamydia trachomatis prevention and treatment), in the absence of HPV intervention strategies, a positive population level impact is observed on the total number of individuals singly infected with HPV. Moreover, the strategy that combines and implements HPV and Chlamydia trachomatis prevention controls is the most cost-effective of all the control strategies in combating the co-infections of HPV and Chlamydia trachomatis.


2020 ◽  
Author(s):  
Sonja Mathias ◽  
Adrien Coulier ◽  
Anass Bouchnita ◽  
Andreas Hellander

AbstractCentre-based, or cell-centre models are a framework for the computational study of multicellular systems with widespread use in cancer modelling and computational developmental biology. At the core of these models are the numerical method used to update cell positions and the force functions that encode the pairwise mechanical interactions of cells. For the latter there are multiple choices that could potentially affect both the biological behaviour captured, and the robustness and efficiency of simulation. For example, available open-source software implementations of centre-based models rely on different force functions for their default behaviour and it is not straightforward for a modeler to know if these are interchangeable. Our study addresses this problem and contributes to the understanding of the potential and limitations of three popular force functions from a numerical perspective. We show empirically that choosing the force parameters such that the relaxation time for two cells after cell division is consistent between different force functions results in good agreement of the population radius of a growing monolayer. Furthermore, we report that numerical stability is not sufficient to prevent unphysical cell trajectories following cell division, and consequently, that too large time steps can cause geometrical differences at the population level.


2020 ◽  
Vol 82 (10) ◽  
Author(s):  
Sonja Mathias ◽  
Adrien Coulier ◽  
Anass Bouchnita ◽  
Andreas Hellander

Abstract Centre-based or cell-centre models are a framework for the computational study of multicellular systems with widespread use in cancer modelling and computational developmental biology. At the core of these models are the numerical method used to update cell positions and the force functions that encode the pairwise mechanical interactions of cells. For the latter, there are multiple choices that could potentially affect both the biological behaviour captured, and the robustness and efficiency of simulation. For example, available open-source software implementations of centre-based models rely on different force functions for their default behaviour and it is not straightforward for a modeller to know if these are interchangeable. Our study addresses this problem and contributes to the understanding of the potential and limitations of three popular force functions from a numerical perspective. We show empirically that choosing the force parameters such that the relaxation time for two cells after cell division is consistent between different force functions results in good agreement of the population radius of a two-dimensional monolayer relaxing mechanically after intense cell proliferation. Furthermore, we report that numerical stability is not sufficient to prevent unphysical cell trajectories following cell division, and consequently, that too large time steps can cause geometrical differences at the population level.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dawit A. Yohannes ◽  
Katri Kaukinen ◽  
Kalle Kurppa ◽  
Päivi Saavalainen ◽  
Dario Greco

Abstract Background Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either “public” CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking. Results We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance. Conclusion We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s.


Author(s):  
Niamh E. Redmond ◽  
Grace P. McCormack

Sequences of the ribosomal internal transcribed spacer regions 1 and 2 (ITS-1 and ITS-2) were employed to investigate relationships between putatively very closely related species of marine haplosclerids and to investigate the species status of Haliclona cinerea. Results indicate that intra-genomic and intra-specific levels of diversity are equivalent, and sequences from multiple clones from a number of individuals of a single species could not be separated on phylogenetic trees. As a result, the ITS regions are not suitable markers for population level studies in marine haplosclerids. Sequences of these regions were highly species specific, and large differences were found between species. ITS sequences from three Callyspongia and three Haliclona species could not be aligned successfully and therefore this locus could not be used to investigate relationships between these putative close relatives. However, ITS sequences retrieved from one H. cinerea were very different from sequences generated from other H. cinerea individuals indicating that this species comprises more than one taxon.


2014 ◽  
Author(s):  
Matthew D MacManes ◽  
Michael B Eisen

As a direct result of intense heat and aridity, deserts are thought to be among the most harsh of environments, particularly for their mammalian inhabitants. Given that osmoregulation can be challenging for these animals, with failure resulting in death, strong selection should be observed on genes related to the maintenance of water and solute balance. One such animal,Peromyscus eremicus, is native to the desert regions of the southwest United States and may live its entire life without oral fluid intake. As a first step toward understanding the genetics that underlie this phenotype, we present a characterization of theP. eremicustranscriptome. We assay four tissues (kidney, liver, brain, testes) from a single individual and supplement this with population level renal transcriptome sequencing from 15 additional animals. We identified a set of transcripts undergoing both purifying and balancing selection based on estimates of Tajima's D. In addition, we used the branch-site test to identify a transcript - Slc2a9, likely related to desert osmoregulation - undergoing enhanced selection inP. eremicusrelative to a set of related non-desert rodents.


2016 ◽  
Vol 53 (1) ◽  
pp. 203-215 ◽  
Author(s):  
Frank Ball ◽  
Tom Britton ◽  
Peter Neal

Abstract We study continuous-time birth–death type processes, where individuals have independent and identically distributed lifetimes, according to a random variable Q, with E[Q] = 1, and where the birth rate if the population is currently in state (has size) n is α(n). We focus on two important examples, namely α(n) = λ n being a branching process, and α(n) = λn(N - n) / N which corresponds to an SIS (susceptible → infective → susceptible) epidemic model in a homogeneously mixing community of fixed size N. The processes are assumed to start with a single individual, i.e. in state 1. Let T, An, C, and S denote the (random) time to extinction, the total time spent in state n, the total number of individuals ever alive, and the sum of the lifetimes of all individuals in the birth–death process, respectively. We give expressions for the expectation of all these quantities and show that these expectations are insensitive to the distribution of Q. We also derive an asymptotic expression for the expected time to extinction of the SIS epidemic, but now starting at the endemic state, which is not independent of the distribution of Q. The results are also applied to the household SIS epidemic, showing that, in contrast to the household SIR (susceptible → infective → recovered) epidemic, its threshold parameter R* is insensitive to the distribution of Q.


Sign in / Sign up

Export Citation Format

Share Document