Bacterial Phylogenetic Reconstruction from Whole Genomes Is Robust to Recombination but Demographic Inference Is Not

Jessica Hedge; Daniel J. Wilson

doi:10.1128/mbio.02158-14

Bacterial Phylogenetic Reconstruction from Whole Genomes Is Robust to Recombination but Demographic Inference Is Not

mBio ◽

10.1128/mbio.02158-14 ◽

2014 ◽

Vol 5 (6) ◽

Cited By ~ 63

Author(s):

Jessica Hedge ◽

Daniel J. Wilson

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Analyses ◽

Phylogenetic Reconstruction ◽

Branch Length ◽

Great Accuracy ◽

Population History ◽

Genome Sequences ◽

Whole Genomes ◽

Branch Lengths ◽

The Impact

ABSTRACT Phylogenetic inference in bacterial genomics is fundamental to understanding problems such as population history, antimicrobial resistance, and transmission dynamics. The field has been plagued by an apparent state of contradiction since the distorting effects of recombination on phylogeny were discovered more than a decade ago. Researchers persist with detailed phylogenetic analyses while simultaneously acknowledging that recombination seriously misleads inference of population dynamics and selection. Here we resolve this paradox by showing that phylogenetic tree topologies based on whole genomes robustly reconstruct the clonal frame topology but that branch lengths are badly skewed. Surprisingly, removing recombining sites can exacerbate branch length distortion caused by recombination. IMPORTANCE Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem.

Download Full-text

Predicting the Impact of Describing New Species on Phylogenetic Patterns

Integrative Organismal Biology ◽

10.1093/iob/obz028 ◽

2019 ◽

Vol 1 (1) ◽

Cited By ~ 1

Author(s):

D C Blackburn ◽

G Giribet ◽

D E Soltis ◽

E L Stanley

Keyword(s):

New Species ◽

Phylogenetic Trees ◽

Branch Length ◽

Length Variation ◽

Tree Shape ◽

Branch Lengths ◽

Taxonomic History ◽

Ecological Patterns ◽

The Impact ◽

Incomplete Sampling

Abstract Although our inventory of Earth’s biodiversity remains incomplete, we still require analyses using the Tree of Life to understand evolutionary and ecological patterns. Because incomplete sampling may bias our inferences, we must evaluate how future additions of newly discovered species might impact analyses performed today. We describe an approach that uses taxonomic history and phylogenetic trees to characterize the impact of past species discoveries on phylogenetic knowledge using patterns of branch-length variation, tree shape, and phylogenetic diversity. This provides a framework for assessing the relative completeness of taxonomic knowledge of lineages within a phylogeny. To demonstrate this approach, we use recent large phylogenies for amphibians, reptiles, flowering plants, and invertebrates. Well-known clades exhibit a decline in the mean and range of branch lengths that are added each year as new species are described. With increased taxonomic knowledge over time, deep lineages of well-known clades become known such that most recently described new species are added close to the tips of the tree, reflecting changing tree shape over the course of taxonomic history. The same analyses reveal other clades to be candidates for future discoveries that could dramatically impact our phylogenetic knowledge. Our work reveals that species are often added non-randomly to the phylogeny over multiyear time-scales in a predictable pattern of taxonomic maturation. Our results suggest that we can make informed predictions about how new species will be added across the phylogeny of a given clade, thus providing a framework for accommodating unsampled undescribed species in evolutionary analyses.

Download Full-text

Faculty Opinions recommendation of Inference of human population history from individual whole-genome sequences.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.11961958.13084059 ◽

2011 ◽

Author(s):

Stephen Wright

Keyword(s):

Human Population ◽

Population History ◽

Whole Genome ◽

Genome Sequences

Download Full-text

Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning

Scientific Reports ◽

10.1038/s41598-021-93917-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Marleen M. Nieboer ◽

Luan Nguyen ◽

Jeroen de Ridder

Keyword(s):

Multiple Instance Learning ◽

Cancer Diagnostics ◽

Common Mechanism ◽

Open Chromatin ◽

Driver Genes ◽

3D Genome ◽

Whole Genomes ◽

Cancer Genomes ◽

Cancer Types ◽

The Impact

AbstractOver the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.

Download Full-text

Molecular Epidemiology and Whole-Genome Analysis of Bovine Foamy Virus in Japan

Viruses ◽

10.3390/v13061017 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1017

Author(s):

Hirohisa Mekata ◽

Tomohiro Okagawa ◽

Satoru Konnai ◽

Takayuki Miyazawa

Keyword(s):

Phylogenetic Analyses ◽

Foamy Virus ◽

Pcr Analysis ◽

Whole Genome ◽

Genome Sequences ◽

Whole Genome Analysis ◽

Virus Family ◽

Bovine Foamy Virus ◽

Novel Genotype

Bovine foamy virus (BFV) is a member of the foamy virus family in cattle. Information on the epidemiology, transmission routes, and whole-genome sequences of BFV is still limited. To understand the characteristics of BFV, this study included a molecular survey in Japan and the determination of the whole-genome sequences of 30 BFV isolates. A total of 30 (3.4%, 30/884) cattle were infected with BFV according to PCR analysis. Cattle less than 48 months old were scarcely infected with this virus, and older animals had a significantly higher rate of infection. To reveal the possibility of vertical transmission, we additionally surveyed 77 pairs of dams and 3-month-old calves in a farm already confirmed to have BFV. We confirmed that one of the calves born from a dam with BFV was infected. Phylogenetic analyses revealed that a novel genotype was spread in Japan. In conclusion, the prevalence of BFV in Japan is relatively low and three genotypes, including a novel genotype, are spread in Japan.

Download Full-text

Phylogenetic Analysis: Basic Concepts and Its Use as a Tool for Virology and Molecular Epidemiology

Acta Scientiae Veterinariae ◽

10.22456/1679-9216.81158 ◽

2018 ◽

Vol 44 (1) ◽

pp. 20

Author(s):

Eloiza Teles Caldart ◽

Helena Mata ◽

Cláudio Wageck Canal ◽

Ana Paula Ravazzolo

Keyword(s):

Phylogenetic Analysis ◽

Amino Acid ◽

Molecular Epidemiology ◽

Phylogenetic Analyses ◽

Phylogenetic Reconstruction ◽

Evolutionary Process ◽

Amino Acid Sequences ◽

Evolutionary Models ◽

Reconstruction Methods ◽

Basic Concepts

Background: Phylogenetic analyses are an essential part in the exploratory assessment of nucleic acid and amino acid sequences. Particularly in virology, they are able to delineate the evolution and epidemiology of disease etiologic agents and/or the evolutionary path of their hosts. The objective of this review is to help researchers who want to use phylogenetic analyses as a tool in virology and molecular epidemiology studies, presenting the most commonly used methodologies, describing the importance of the different techniques, their peculiar vocabulary and some examples of their use in virology.Review: This article starts presenting basic concepts of molecular epidemiology and molecular evolution, emphasizing their relevance in the context of viral infectious diseases. It presents a session on the vocabulary relevant to the subject, bringing readers to a minimum level of knowledge needed throughout this literature review. Within its main subject, the text explains what a molecular phylogenetic analysis is, starting from a multiple alignment of nucleotide or amino acid sequences. The different software used to perform multiple alignments may apply different algorithms. To build a phylogeny based on amino acid or nucleotide sequences it is necessary to produce a data matrix based on a model for nucleotide or amino acid replacement, also called evolutionary model. There are a number of evolutionary models available, varying in complexity according to the number of parameters (transition, transversion, GC content, nucleotide position in the codon, among others). Some papers presented herein provide techniques that can be used to choose evolutionary models. After the model is chosen, the next step is to opt for a phylogenetic reconstruction method that best fits the available data and the selected model. Here we present the most common reconstruction methods currently used, describing their principles, advantages and disadvantages. Distance methods, for example, are simpler and faster, however, they do not provide reliable estimations when the sequences are highly divergent. The accuracy of the analysis with probabilistic models (neighbour joining, maximum likelihood and bayesian inference) strongly depends on the adherence of the actual data to the chosen development model. Finally, we also explore topology confidence tests, especially the most used one, the bootstrap. To assist the reader, this review presents figures to explain specific situations discussed in the text and numerous examples of previously published scientific articles in virology that demonstrate the importance of the techniques discussed herein, as well as their judicious use.Conclusion: The DNA sequence is not only a record of phylogeny and divergence times, but also keeps signs of how the evolutionary process has shaped its history and also the elapsed time in the evolutionary process of the population. Analyses of genomic sequences by molecular phylogeny have demonstrated a broad spectrum of applications. It is important to note that for the different available data and different purposes of phylogenies, reconstruction methods and evolutionary models should be wisely chosen. This review provides theoretical basis for the choice of evolutionary models and phylogenetic reconstruction methods best suited to each situation. In addition, it presents examples of diverse applications of molecular phylogeny in virology.

Download Full-text

Evidence for Range Expansion and Origins of an Invasive Hornet Vespa bicolor (Hymenoptera, Vespidae) in Taiwan, with Notes on Its Natural Status

Insects ◽

10.3390/insects12040320 ◽

2021 ◽

Vol 12 (4) ◽

pp. 320

Author(s):

Sheng-Shan Lu ◽

Junichi Takahashi ◽

Wen-Chi Yeh ◽

Ming-Lun Lu ◽

Jing-Yi Huang ◽

...

Keyword(s):

Range Expansion ◽

Ecological Niche Modeling ◽

Phylogenetic Analyses ◽

Current Status ◽

Mountain Areas ◽

Colony Losses ◽

Short Branch ◽

Branch Lengths ◽

The Government ◽

Central Taiwan

The invasive alien species (IAS) Vespa bicolor is the first reported hornet that has established in Taiwan and is concerning as they prey on honeybee Apis mellifera, which leads to colony losses and public concerns. Thus, the aim of this study was to assess the current status of V. bicolor abundance, dispersal, and impact and to trace the origins of Taiwan’s V. bicolor population. Our studies took place in five areas in northern to central Taiwan. We used mtDNA in the phylogenetic analyses. Field survey and ecological niche modeling (ENM) were used to understand the origins and current range of the invasive species. Two main subgroups of V. bicolor in the phylogenetic tree were found, and a clade with short branch lengths in Southeastern China and Taiwan formed a subgroup, which shows that the Taiwan population may have invaded from a single event. Evidence shows that V. bicolor is not a severe pest to honeybees in the study area; however, using ENM, we predict the rapid dispersion of this species to the cooler and hilly mountain areas of Taiwan. The management of V. bicolor should also involve considering it a local pest to reduce loss by beekeepers and public fear in Taiwan. Our findings highlight how the government, beekeepers, and researchers alike should be aware of the implications of V. bicolor’s rapid range expansion in Taiwan, or in other countries.

Download Full-text

Phylogeographic structure and gene flow of Himalayan snowcock (Tetraogallus himalayensis)

Animal Biology ◽

10.1163/157075610x523314 ◽

2010 ◽

Vol 60 (4) ◽

pp. 449-465

Author(s):

Wen Longying ◽

Zhang Lixun ◽

An Bei ◽

Luo Huaxing ◽

Liu Naifa ◽

...

Keyword(s):

Demographic History ◽

Phylogenetic Analyses ◽

Divergence Time ◽

Population Expansion ◽

Population History ◽

Tibet Plateau ◽

Phylogeographic Structure ◽

Mitochondrial Cytochrome B ◽

History Of ◽

Qinghai Tibet Plateau

AbstractWe have used phylogeographic methods to investigate the genetic structure and population history of the endangered Himalayan snowcock (Tetraogallus himalayensis) in northwestern China. The mitochondrial cytochrome b gene was sequenced of 102 individuals sampled throughout the distribution range. In total, we found 26 different haplotypes defined by 28 polymorphic sites. Phylogenetic analyses indicated that the samples were divided into two major haplogroups corresponding to one western and one eastern clade. The divergence time between these major clades was estimated to be approximately one million years. An analysis of molecular variance showed that 40% of the total genetic variability was found within local populations, 12% among populations within regional groups and 48% among groups. An analysis of the demographic history of the populations suggested that major expansions have occurred in the Himalayan snowcock populations and these correlate mainly with the first and the second largest glaciations during the Pleistocene. In addition, the data indicate that there was a population expansion of the Tianshan population during the uplift of the Qinghai-Tibet Plateau, approximately 2 million years ago.

Download Full-text

Robust and scalable inference of population history from hundreds of unphased whole genomes

Nature Genetics ◽

10.1038/ng.3748 ◽

2016 ◽

Vol 49 (2) ◽

pp. 303-309 ◽

Cited By ~ 206

Author(s):

Jonathan Terhorst ◽

John A Kamm ◽

Yun S Song

Keyword(s):

Population History ◽

Whole Genomes ◽

Scalable Inference

Download Full-text

A global phylogenetic analysis of Japanese tonsil-derived Epstein–Barr virus strains using viral whole-genome cloning and long-read sequencing

Journal of General Virology ◽

10.1099/jgv.0.001549 ◽

2021 ◽

Author(s):

Misako Yajima ◽

Risako Kakuta ◽

Yutaro Saito ◽

Shiori Kitaya ◽

Atsushi Toyoda ◽

...

Keyword(s):

Viral Genome ◽

Epstein Barr Virus ◽

High Throughput Sequencing ◽

Phylogenetic Analyses ◽

Regional Distribution ◽

Genome Sequences ◽

Barr Virus ◽

Endemic Areas ◽

Epstein Barr ◽

Genome Heterogeneity

Epstein–Barr virus (EBV) establishes lifelong latent infection in the majority of healthy individuals, while it is a causative agent for various diseases, including some malignancies. Recent high-throughput sequencing results indicate that there are substantial levels of viral genome heterogeneity among different EBV strains. However, the extent of EBV strain variation among asymptomatically infected individuals remains elusive. Here, we present a streamlined experimental strategy to clone and sequence EBV genomes derived from human tonsillar tissues, which are the reservoirs of asymptomatic EBV infection. Complete EBV genome sequences, including those of repetitive regions, were determined for seven tonsil-derived EBV strains. Phylogenetic analyses based on the whole viral genome sequences of worldwide non-tumour-derived EBV strains revealed that Asian EBV strains could be divided into several distinct subgroups. EBV strains derived from nasopharyngeal carcinoma-endemic areas constitute different subgroups from a subgroup of EBV strains from non-endemic areas, including Japan. The results could be consistent with biased regional distribution of EBV-associated diseases depending on the different EBV strains colonizing different regions in Asian countries.

Download Full-text

Morphological and molecular phylogenetic analyses of Zepedanulus ishikawai (Arachnida: Opiliones: Laniatores: Epedanidae) in the southern part of the Ryukyu Archipelago

The Canadian Entomologist ◽

10.4039/tce.2021.42 ◽

2021 ◽

pp. 1-28

Author(s):

Yoshimasa Kumekawa ◽

Haruka Fujimoto ◽

Osamu Miura ◽

Ryo Arakawa ◽

Jun Yokoyama ◽

...

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Analyses ◽

Divergence Time ◽

Morphological Characters ◽

Ryukyu Archipelago ◽

Divergence Time Estimates ◽

Time Estimates ◽

Molecular Phylogenetic ◽

History Of ◽

Polymerase Chain

Abstract Harvestmen (Arachnida: Opiliones) are soil animals with extremely low dispersal abilities that experienced allopatric differentiation. To clarify the morphological and phylogenetic differentiation of the endemic harvestman Zepedanulus ishikawai (Suzuki, 1971) (Laniatores: Epedanidae) in the southern part of the Ryukyu Archipelago, we conducted molecular phylogenetic analyses and divergence time estimates based on CO1 and 16S rRNA sequences of mtDNA, the 28S rRNA sequence of nrDNA, and the external morphology. A phylogenetic tree based on mtDNA sequences indicated that individuals of Z. ishikawai were monophyletic and were divided into clade I and clade II. This was supported by the nrDNA phylogenetic tree. Although clades I and II were distributed sympatrically on all three islands examined (Ishigaki, Iriomote, and Yonaguni), heterogeneity could not be detected by polymerase chain reaction–restriction fragment length polymorphism of nrDNA, indicating that clades I and II do not have a history of hybridisation. Also, several morphological characters differed significantly between individuals of clade I and clade II. The longstanding isolation of the southern Ryukyus from the surrounding islands enabled estimation of the original morphological characters of both clades of Z. ishikawai.

Download Full-text