branch lengths
Recently Published Documents


TOTAL DOCUMENTS

193
(FIVE YEARS 69)

H-INDEX

24
(FIVE YEARS 6)

2022 ◽  
Author(s):  
Theo Tricou ◽  
Eric Tannier ◽  
Damien M de Vienne

Introgression, endosymbiosis and gene transfer, i.e. Horizontal Gene Flow (HGF), are primordial sources of innovation in all domains of life. Our knowledge on HGF relies on detection methods that exploit some of its signatures left on extant genomes. One of them is the effect of HGF on branch lengths of constructed phylogenies. This signature has been formalized in statistical tests for HGF detection, and used for example to detect massive adaptive gene flows in malaria vectors or to order evolutionary events involved in eukaryogenesis. However these studies rely on the assumption that ghost lineages (all unsampled extant and extinct taxa) have little influence. We demonstrate here with simulations and data re-analysis, that when considering the more realistic condition that unsampled taxa are legion compared to sampled ones, the conclusion of these studies become unfounded or even reversed. This illustrates the necessity to recognize the existence of ghosts in evolutionary studies.


2022 ◽  
Author(s):  
Avika Dixit ◽  
Anju Kagal ◽  
Yasha Ektefaie ◽  
Luca Freschi ◽  
Rajesh Karyakarte ◽  
...  

Background: Mycobacterium tuberculosis (Mtb) transmissibility may vary between lineages (or variants) and this may contribute to the slow decline of tuberculosis (TB) incidence. The objective of our study was to compare transmissibility across four major lineages (L1-4) of Mtb among participants from two cohort studies in Pune, India. Methods: We performed whole-genome sequencing (WGS) of Mtb sputum culture-positive isolates from participants in two prospective cohort studies of adults with pulmonary TB seeking care at public treatment centers in Pune, Maharashtra. We performed genotypic susceptibility prediction for both first- and second-line drugs using a previously validated random forest model. We used single nucleotide substitutions (SNS) and maximum likelihood estimation to build isolate phylogenies by lineage. We used Bayesian molecular dating to estimate ancestral node ages and compared tree characteristics using a two-sample Kolmogorov-Smirnov (KS) test. Results: Of the 642 isolates from distinct study participants that underwent WGS, 612 met sequence quality criteria. The median age of the 612 participants was 31 years (IQR 24.4-44.2), the majority were male (64.7%) and sputum smear-positive (83.3%), and 6.7% had co-infection with HIV. Most isolates belonged to L3 (44.6%). The majority (61.1%) of multidrug-resistant isolates (MDR, resistant to isoniazid and rifampin) belonged to L2 (P < 0.001 [Fisher's Exact]). There was no significant difference in host characteristics between participants infected with the four major lineages. In phylogenetic analysis, we measured shorter terminal branch lengths in the L2 tree compared to L1 and L3 trees indicating less time elapsing between transmission and sampling and higher transmissibility (median branch lengths: L2 - 3.3, L3 - 7.8, p <0.001). Branching times for L2 and L4 were more recent than L1 and L3 indicating recent introduction into the region (p < 0.01 [KS test]). Conclusion: Modern Mtb lineages (L2 and L4) were more recently introduced in western India, compared to older lineages (L1 and L3). L2 shows a higher frequency of drug-resistance and higher transmissibility. Our findings highlight the need for contact tracing around cases of TB due to L2, and heightened surveillance of TB antibiotic resistance in India.


2022 ◽  
Author(s):  
Fabrizio Menardo

Detecting factors associated with transmission is important to understand disease epidemics, and to design effective public health measures. Clustering and terminal branch lengths (TBL) analyses are commonly applied to genomic data sets of Mycobacterium tuberculosis (MTB) to identify sub-populations with increased transmission. Here, I used a simulation-based approach to investigate what epidemiological processes influence the results of clustering and TBL analyses, and whether difference in transmission can be detected with these methods. I simulated MTB epidemics with different dynamics (latency, infectious period, transmission rate, basic reproductive number R0, sampling proportion, and molecular clock), and found that all these factors, except the length of the infectious period and R0, affect the results of clustering and TBL distributions. I show that standard interpretations of this type of analyses ignore two main caveats: 1) clustering results and TBL depend on many factors that have nothing to do with transmission, 2) clustering results and TBL do not tell anything about whether the epidemic is stable, growing, or shrinking. An important consequence is that the optimal SNP threshold for clustering depends on the epidemiological conditions, and that sub-populations with different epidemiological characteristics should not be analyzed with the same threshold. Finally, these results suggest that different clustering rates and TBL distributions, that are found consistently between different MTB lineages, are probably due to intrinsic bacterial factors, and do not indicate necessarily differences in transmission or evolutionary success.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S783-S784
Author(s):  
Avika Dixit ◽  
Anju Kagal ◽  
Yasha Ektefaie ◽  
Luca Freschi ◽  
Rahul Lokhande ◽  
...  

Abstract Background Mycobacterium tuberculosis (Mtb) transmissibility may vary between lineages (or variants) and this may contribute to the slow decline of tuberculosis incidence. The objective of our study was to compare transmissibility across four major lineages (L1-4) of Mtb in Pune, India. Methods We performed whole-genome sequencing (WGS) of Mtb isolated from sputum culture of adult patients with pulmonary TB. We performed genotypic susceptibility testing for both first- and second-line drugs using a previously validated random forest predictor. We identified single nucleotide polymorphisms and generated a multiple sequence alignment excluding drug resistance conferring mutations to avoid skewing the phylogeny due to convergent evolution in these regions. We used Bayesian molecular dating to generate phylogenies and compared tree characteristics using a two-sample Kolmogorov-Smirnov (KS) test. Results Of the 642 isolates from distinct study participants that underwent WGS, 612 met quality criteria. The median age of participants was 31 years (range 18-74), the majority were male (64.7%) and sputum smear-positive (83.3%), and 6.7% had co-infection with HIV (Table 1). There was no significant difference in baseline characteristics between lineages. The majority of isolates belonged to L3 (44.6%). The majority (61.1%) of multidrug-resistant (MDR, resistant to isoniazid and rifampin) isolates belonged to L2. In phylogenetic analysis, we found evidence of higher transmissibility of L2 as indicated by shorter branch lengths (i.e., less time had elapsed between transmission and sampling) and more genetic similarity (smaller pairwise single nucleotide polymorphism [SNP] distances) among L2 isolates as compared to other lineages (Figure 1). Branching times for L2 and L4 were smaller than L1 and L3 indicating recent introduction into the region (p &lt; 0.001 [KS test]). Figure 1: Lineage-wise distribution of A) phylogenetic tree branch lengths (log) and B) pairwise single nucleotide polymorphism (SNP) distance, using 612 tuberculosis isolates from Pune, India. P values calculated using two-sample Kolmogorov-Smirnov test. Table 1: Demographic characteristics of study participants included in the study, by lineage. Conclusion Modern Mtb lineages (L2 and L4) were relatively recently introduced in western India, as compared to older lineages (L1 and L3), with the more drug-resistant L2 showing higher transmissibility. These findings highlight the need for early detection and treatment initiation to interrupt transmission with important implications for antimicrobial stewardship and heightened surveillance of TB resistance rates. Disclosures All Authors: No reported disclosures


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Luca Freschi ◽  
Roger Vargas ◽  
Ashaque Husain ◽  
S. M. Mostofa Kamal ◽  
Alena Skrahina ◽  
...  

AbstractMycobacterium tuberculosis is a clonal pathogen proposed to have co-evolved with its human host for millennia, yet our understanding of its genomic diversity and biogeography remains incomplete. Here we use a combination of phylogenetics and dimensionality reduction to reevaluate the population structure of M. tuberculosis, providing an in-depth analysis of the ancient Indo-Oceanic Lineage 1 and the modern Central Asian Lineage 3, and expanding our understanding of Lineages 2 and 4. We assess sub-lineages using genomic sequences from 4939 pan-susceptible strains, and find 30 new genetically distinct clades that we validate in a dataset of 4645 independent isolates. We find a consistent geographically restricted or unrestricted pattern for 20 groups, including three groups of Lineage 1. The distribution of terminal branch lengths across the M. tuberculosis phylogeny supports the hypothesis of a higher transmissibility of Lineages 2 and 4, in comparison with Lineages 3 and 1, on a global scale. We define an expanded barcode of 95 single nucleotide substitutions that allows rapid identification of 69 M. tuberculosis sub-lineages and 26 additional internal groups. Our results paint a higher resolution picture of the M. tuberculosis phylogeny and biogeography.


2021 ◽  
Author(s):  
Sergey Bocharov ◽  
Simon Harris ◽  
Emma Kominek ◽  
Arne O Mooers ◽  
Mike Steel

In the simplest phylodynamic model (the pure-birth Yule process), lineages split independently at a constant rate λ for time t. The length of a randomly chosen edge (either interior or pendant) in the resulting tree has an expected value that rapidly converges to 1/(2λ) as t grows, and thus is essentially independent of t. However, the behaviour of the length L of the longest pendant edge reveals remarkably different behaviour: L converges to t/2 as the expected number of leaves grows. Extending this model to allow an extinction rate μ (where μ<λ), we also establish a similar result for birth--death trees, except that t/2 is replaced by t/2 x (1-μ/λ). This 'complete' tree may contain subtrees that have died out before time t; for the 'reduced tree' that just involves the leaves present at time t and their direct ancestors, the longest pendant edge length L again converges to t/2. Thus, there is likely to be at least one extant species whose associated pendant branch attaches to the tree approximately half-way back in time to the origin of the entire clade. We also briefly consider the length of the shortest edges. Our results are relevant to phylogenetic diversity indices in biodiversity conservation, and to questions concerning the length of aligned sequences required to correctly infer a tree. We compare our theoretical results with simulations, and with the branch lengths from a recent phylogenetic tree of all mammals.


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0255978
Author(s):  
Daniel F. Marchán ◽  
Sergio Jiménez ◽  
Thibaud Decaëns ◽  
Jorge Domínguez

The Massif Central in France could potentially harbor numerous ancient endemic lineages owing to its long history of continuous geological stability. Several endemic earthworm species inhabit the area, with Allolobophora (Gatesona) chaetophora, Helodrilus (Acystodrilus) and Avelona ligra showing hints of a common evolutionary origin. However, the phylogenetic relationships and taxonomic status of the species remain to be studied through integrative molecular and morphological methods. To this end, eight species including most of the known species and subspecies of All. (Gatesona), Helodrilus (Acystodrilus) musicus, and Avelona ligra were sequenced for a set of five molecular markers. The species were grouped on the basis of the molecular findings in a phylogenetic framework. All. (Gatesona) was included within the same clade as Helodrilus (Acystodrilus) and Avelona, separated from Allolobophora sensu stricto, supporting its status as a good genus. Branch lengths and average pairwise genetic distances suggested the subspecies of All. (Gatesona) chaetophora examined should be considered species-level taxa. Thus, a generic diagnosis for Gatesona stat. nov. is provided, along with redescriptions of Gatesona chaetophora comb. nov., Gatesona rutena comb. nov. stat. nov., Gatesona lablacherensis comb. nov. stat. nov. and Gatesona serninensis comb. nov. stat. nov. The study findings highlight the need for further sampling of earthworm diversity in the Massif Central (and Southern France), in addition to an increased focus on the Eastern European species of Helodrilus.


Genetics ◽  
2021 ◽  
Author(s):  
Gertjan Bisschop ◽  
Konrad Lohse ◽  
Derek Setter

Abstract Current methods of identifying positively selected regions in the genome are limited in two key ways: the underlying models cannot account for the timing of adaptive events and the comparison between models of selective sweeps and sequence data is generally made via simple summaries of genetic diversity. Here we develop a tractable method of describing the effect of positive selection on the genealogical histories in the surrounding genome, explicitly modeling both the timing and context of an adaptive event. In addition, our framework allows us to go beyond analyzing polymorphism data via the site frequency spectrum or summaries thereof and instead leverage information contained in patterns of linked variants. Tests on both simulations and a human data example, as well as a comparison to SweepFinder2, show that even with very small sample sizes, our analytic framework has higher power to identify old selective sweeps and to correctly infer both the time and strength of selection. Finally, we derived the marginal distribution of genealogical branch lengths at a locus affected by selection acting at a linked site. This provides a much-needed link between our analytic understanding of the effects of sweeps on sequence variation and recent advances in simulation and heuristic inference procedures that allow researchers to examine the sequence of genealogical histories along the genome.


2021 ◽  
Author(s):  
Helmut E Simon ◽  
Gavin A Huttley

The site frequency spectrum (SFS) is a commonly used statistic to summarize genetic variation in a sample of genomic sequences from a population. Such a genomic sample is associated with an imputed genealogical history with attributes such as branch lengths, coalescence times and the time to the most recent common ancestor (TMRCA) as well as topological and combinatorial properties. We present a Bayesian model for sampling from the joint posterior distribution of coalescence times conditional on the SFS associated with a sample of sequences in the absence of selection. In this model, the combinatorial properties of a genealogy, which is represented as a coalescent tree, are expressed as matrices. This facilitates the calculation of likelihoods and the effective sampling of the entire space of tree structures according to the Equal Rates Markov (or Yule-type) measure. Unlike previous methods, assumptions as to the type of stochastic process that generated the genealogical tree are not required. Novel approaches to defining both uninformative and informative prior distributions are employed. The uncertainty in inference due to the stochastic nature of mutation and the unknown tree structure is expressed by the shape of the posterior distributions. The method is implemented using the general purpose Markov Chain Monte Carlo software PyMC3. From the sampled posterior distribution of coalescence times, one can also infer related quantities such as the number of ancestors of a sample at a given time in the past (ancestral distribution) and the probability of specific relationships between branch lengths (for example, that the most recent branch is longer than all the others). The performance of the method is evaluated against simulated data and is also applied to historic mitochondrial data from the Nuu-Chah-Nulth people of North America. The method can be used to obtain estimates of the TMRCA of the sample. The relationship of these estimates to those given by ''Thomson's estimator'' is explored. Keywords: coalescent theory; Bayesian inference; time to most recent common ancestor; site frequency spectrum


Sign in / Sign up

Export Citation Format

Share Document