scholarly journals A11 Evaluation of phylogenetic inference methods to determine direction of HIV transmission

2019 ◽  
Vol 5 (Supplement_1) ◽  
Author(s):  
R Rose ◽  
A D Redd ◽  
S Lamers ◽  
S F Porcella ◽  
S E Hudelson ◽  
...  

Abstract It has been postulated that the direction of HIV transmission between two individuals can be determined by phylogenetic analysis of HIV sequences. This approach may be problematic, since HIV sequences from newly infected individuals are often more similar to index sequences from samples collected years before transmission, compared to those from samples collected at the time of transmission. We evaluated the accuracy of phylogenetic methods for determining the direction of HIV transmission by analyzing next-generation sequencing (NGS) data from index–partner pairs enrolled in the HIV Prevention Trials Network (HPTN) 052 trial. HIV-infected index and HIV-uninfected partner participants were enrolled as serodiscordant couples; samples were analyzed from couples with index-to-partner HIV transmission that was confirmed by genetic linkage studies. NGS for HIV gp41 (HXB2 coordinates: 7691–8374) was performed using plasma samples from thirty-nine index–partner pairs (seventy-eight samples collected within 3 months of partner seroconversion). Maximum likelihood trees were generated using the entire dataset using FastTree v.2. Topological patterns of HIV from each index–partner pair were analyzed. The analysis included 9,368 consensus sequences and 521,145 total sequence reads for the seventy-eight samples analyzed. In 10 per cent (four out of thirty-nine) of couples, the phylogeny was inconsistent with the known direction of transmission. In 26 per cent (ten out of thirty-nine) of couples, the phylogeny results could not discern directionality. In 64 per cent (twenty-five out of thirty-nine) of couples, the results correctly indicated index-to-partner transmission; in two of these twenty-five cases, only one index sequence was closest to the most recent common ancestor. Phylogenetic analysis of NGS data obtained from samples collected within 3 months of transmission correctly determined the direction of transmission in 64 per cent of the cases analyzed. In 36 per cent of the cases, the phylogenetic topology did not support the known direction of infection, and in one-third of these cases the observed topology was opposite to the known direction of transmission. This demonstrates that phylogenetic topology alone may not be sufficient to accurately determine the direction of HIV transmission.

Author(s):  
Ben Bettisworth ◽  
Alexandros Stamatakis

AbstractSummaryIn phylogenetic analysis, it is common to infer unrooted trees. Thus, it is unknown which node is the most recent common ancestor of all the taxa in the phylogeny. However, knowing the root location is desirable for downstream analyses and interpretation. There exist several methods to recover a root, such as midpoint rooting or rooting the tree at an outgroup. Non-reversible Markov models can also be used to compute the likelihood of a potential root position. We present a software called RootDigger which uses a non-reversible Markov model to compute the most likely root location on a given tree and to infer a confidence value for each possible root placement.Availability and implementationRootDigger is available under the MIT licence at https://github.com/computations/root_digger


Viruses ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 758 ◽  
Author(s):  
Keylie M. Gibson ◽  
Margaret C. Steiner ◽  
Uzma Rentia ◽  
Matthew L. Bendall ◽  
Marcos Pérez-Losada ◽  
...  

Next-generation sequencing (NGS) offers a powerful opportunity to identify low-abundance, intra-host viral sequence variants, yet the focus of many bioinformatic tools on consensus sequence construction has precluded a thorough analysis of intra-host diversity. To take full advantage of the resolution of NGS data, we developed HAplotype PHylodynamics PIPEline (HAPHPIPE), an open-source tool for the de novo and reference-based assembly of viral NGS data, with both consensus sequence assembly and a focus on the quantification of intra-host variation through haplotype reconstruction. We validate and compare the consensus sequence assembly methods of HAPHPIPE to those of two alternative software packages, HyDRA and Geneious, using simulated HIV and empirical HIV, HCV, and SARS-CoV-2 datasets. Our validation methods included read mapping, genetic distance, and genetic diversity metrics. In simulated NGS data, HAPHPIPE generated pol consensus sequences significantly closer to the true consensus sequence than those produced by HyDRA and Geneious and performed comparably to Geneious for HIV gp120 sequences. Furthermore, using empirical data from multiple viruses, we demonstrate that HAPHPIPE can analyze larger sequence datasets due to its greater computational speed. Therefore, we contend that HAPHPIPE provides a more user-friendly platform for users with and without bioinformatics experience to implement current best practices for viral NGS assembly than other currently available options.


Author(s):  
Matthew L Bendall ◽  
Keylie M Gibson ◽  
Margaret C Steiner ◽  
Uzma Rentia ◽  
Marcos Pérez-Losada ◽  
...  

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.


Author(s):  
Kenneth Siu-Sing Leung ◽  
Timothy Ting-Leung Ng ◽  
Alan Ka-Lun Wu ◽  
Miranda Chong-Yee Yau ◽  
Hiu-Yin Lao ◽  
...  

AbstractInitial cases of COVID-19 reported in Hong Kong were mostly imported from China. However, most cases reported in February 2020 were locally-acquired infections, indicating local community transmissions. We extracted the demographic, clinical and epidemiological data from 50 COVID-19 patients, who accounted for 53.8% of the cases in Hong Kong by February 2020. Whole-genome sequencing of the SARS-CoV-2 were conducted to determine the phylogenetic relatedness and transmission dynamics. Only three (6.0%) patients required ICU admission. Phylogenetic analysis identified six transmission clusters. All locally-acquired cases harboured a common mutation Orf3a G251V and were clustered in two subclades in global phylogeny of SARS-CoV-2. The estimated time to the most recent common ancestor of local COVID-2019 outbreak was December 24, 2019 with an evolutionary rate of 3.04×10−3 substitutions per site per year. The reproduction number value was 1.84. Social distancing and vigilant epidemiological control are crucial to the containment of COVID-19 transmission.Article summary linesA combined epidemiological and phylogenetic analysis of early COVID-19 outbreak in Hong Kong revealed that a SARS-CoV-2 variant with ORF3a G251V mutation accounted for all locally acquired cases, and that asymptomatic carriers could be a huge public health risk for COVID-19 control.


2021 ◽  
Author(s):  
David P Maison ◽  
Sean B. Cleveland ◽  
Vivek R Nerurkar

Abstract Using genomics, bioinformatics and statistics, herein we demonstrate the effect of statewide and nationwide quarantine on the introduction of SARS-CoV-2 variants of concern (VOC) in Hawai’i. To define the origins of introduced VOC, we analyzed 260 VOC sequences from Hawai’i, and 301,646 VOC sequences worldwide, deposited in the GenBank and global initiative on sharing all influenza data (GISAID), and constructed phylogenetic trees. The trees define the most recent common ancestor as the origin. Further, the multiple sequence alignment used to generate the phylogenetic trees identified the consensus single nucleotide polymorphisms in the VOC genomes. These consensus sequences allow for VOC comparison and identification of mutations of interest in relation to viral immune evasion and host immune activation. Of note is the P71L substitution within the E protein, the protein sensed by TLR2 to produce cytokines, found in the B.1.351 VOC may diminish the efficacy of some vaccines. Based on the phylogenetic trees, the B.1.1.7, B.1.351, B.1.427, and B.1.429 VOC have been introduced in Hawai'i multiple times since December 2020 from several definable geographic regions. From the first worldwide report of VOC in GenBank and GISAID, to the first arrival of VOC in Hawai’i, averages 320 days with quarantine, and 132 days without quarantine. As such, the effect of quarantine is shown to significantly affect the time to arrival of VOC in Hawai’i, both during and following quarantine. Further, the collective 2020 quarantine of 43-states in the United States demonstrates a profound impact in delaying the arrival of VOC in states that did not practice quarantine, such as Utah. Our data demonstrates that at least 76% of all definable SARS-CoV-2 VOC have entered Hawai’i from California, with the B.1.351 variant in Hawai’i originating exclusively from the United Kingdom. These data provide a foundation for policy-makers and public-health officials to apply precision public health genomics to real-world policies such as mandatory screening and quarantine.


Author(s):  
Rajesh Raghunanth Pharande ◽  
Sharmila Badal Majee ◽  
Satish S. Gaikwad ◽  
S. D. Moregoankar ◽  
AnilKumar Bannalikar ◽  
...  

Nearly 1.7 million cases of dog bites are reported every year in India and many cases of animal rabies are left unattended and undiagnosed. Therefore, a mere diagnosis of rabies is not sufficient to understand the epidemiology and the spread of the rabies virus (RV) in animals. There is a paucity of information about the evolutionary dynamics of RV in dogs and its biodiversity patterns in India. In total, 50 dog-brain samples suspected of rabies were screened by the nucleoprotein- (N) and glycoprotein- (G) gene PCR. The N and G genes were subsequently sequenced to understand the molecular evolution in these genes. The phylogenetic analysis of the N gene revealed that six isolates in the Mumbai region belonged to a single Arctic lineage. Time-scaled phylogeny by Bayesian coalescent analysis of the partial N gene revealed that the time to the most recent common ancestor (TMRCA) for the sequences belonged to the cluster from 2006.68 with a highest posterior density of 95 % betweeen 2005–2008, which is assigned to Indian lineage I. Migration pattern revealed a strong Bayes factor between Mumbai to Delhi, Panji to Hyderabad, Delhi to Chennai, and Chennai to Chandigarh. Phylogenetic analysis of the G gene revealed that the RVs circulating in the Mumbai region are divided into three lineages. Time-scaled phylogeny by the Bayesian coalescent analysis method estimated that the TMRCA for sequences under study was from 1993 and Indian clusters was from 1962. In conclusion, the phylogenetic analysis of the N gene revealed that six isolates belonged to single Arctic lineages along with other Indian isolates and they were clustered into a single lineage but divided into three clades based on the G-gene sequences. The present study highlights and enhances the current molecular epidemiology and evolution of RV and revealed strong location bias and geographical clustering within Indian isolates on the basis of N and G genes.


Author(s):  
Y. Vakulenko ◽  
A. Lukashev ◽  
A. Deviatkin

Molecular phylogenetics, and, in particular, statistical phylogenetics is widely used to solve the fundamental and applied problems of virology. Bayesian, or statistical, phylogenetic methods, which came into practice 10-15 years ago, significantly expanded the range of questions that can be answered based on the analysis of nucleotide and amino acid sequences. The ability to use different evolution models allows infering the chronology, geography and dynamics of the infection spreading. For example, analysis of a globally distributed HIV group M by Bayesian methods demonstrated with a probability of 99% that the most recent common ancestor of these viruses existed in the surroundings of the city of Kinshasa (Democratic Republic of the Congo) in the early 1920s. Another study showed that H9N2 influenza virus most likely passed on to humans from wild ducks in Hong Kong in the late 1960s. In addition, using the Bayesian analysis allows evaluating the effect of measures taken on the development of the epidemic process. For example, it was retrospectively shown that the number of hepatitis C virus infections in Egypt increased by several orders of magnitude in the middle of the twentieth century. A sharp increase is associated with the treatment for schistosomiasis using non-sterile repeatedly used syringes. A set of Bayesian analysis methods has been used in tens of thousands of publications describing various aspects of the occurrence and spread of infectious diseases in humans and animals. This was facilitated by the development and accessibility of software that implements these methods. The complexity of Bayesian phylogenetic methods imposes strict requirements on the data being analyzed. The correctness of the phylogenetic analysis results depends on various factors. For example, it is necessary to choose an evolutionary model that most adequately describes the studied objects. A mandatory step in formulating the results is the justification of the selected model. For viruses, the acquisition of genetic elements from other organisms is typical, therefore, the genomes of even closely related viruses may have non-homologous regions unsuitable for phylogenetic analysis. Another aspect is the creation of a representative dataset. All stages of the analysis sometimes are not indicated in publications, which is why obtained results can be interpreted ambiguously. The correct use of statistical phylogenetics methods in virology is possible only with an understanding of their principles, proper methods of data preparation and evolutionary models selection criteria.


2020 ◽  
Vol 6 (1) ◽  
pp. 55-64
Author(s):  
A.R. McTaggart ◽  
C.J. Prychid ◽  
J.J. Bruhl ◽  
R.G. Shivas

The PhyloCode is used to classify taxa based on their relation to a most recent common ancestor as recovered from a phylogenetic analysis. We examined the first specimen of Cintractiella (Ustilaginomycotina) collected from Australia and determined its systematic relationship to other Fungi. Three ribosomal DNA loci were analysed both with and without constraint to a phylogenomic hypothesis of the Ustilaginomycotina. Cintractiella did not share a most recent common ancestor with other orders of smut fungi. We used the PhyloCode to define the Cintractiellales, a monogeneric order with four species of Cintractiella, including C. scirpodendri sp. nov. on Scirpodendron ghaeri. The Cintractiellales may have shared a most recent common ancestor with the Malasseziomycetes, but are otherwise unresolved at the rank of class.


2021 ◽  
Author(s):  
Juan Pedro M Camacho ◽  
Josefa Cabrero ◽  
Maria Dolores Lopez-Leon ◽  
Maria Martin-Pecina ◽  
Francisco Perfectti ◽  
...  

Background: The full catalogue of satellite DNA (satDNA) within a same genome constitutes the satellitome. The Library Hypothesis predicts that satDNA in relative species reflects that in their common ancestor, but the evolutionary mechanisms and pathways of satDNA evolution have never been analyzed for full satellitomes. We compare here the satellitomes of two Oedipodine grasshoppers (Locusta migratoria and Oedaleus decorus) which shared their most recent common ancestor about 22.8 Ma ago. Results: We found that about one-third of their satDNA families (near 60 in every species) showed sequence homology, and were grouped into 12 orthologous superfamilies. The turnover rate of consensus sequences was extremely variable among the 20 orthologous family pairs analyzed in both species. The satDNAs shared by both species showed poor association with sequence signatures and motives frequently argued as functional, except for short inverted repeats allowing short dyad symmetries and non-B DNA conformations. Orthologous satDNAs frequently showed different FISH pattern at both intra- and interspecific levels. We defined indices of homogenization and degeneration, and quantified the level of incomplete library sorting between species. Conclusions: Our analyses revealed that satDNA degenerates through point mutation and rejuvenates through partial turnovers caused by massive tandem duplications (the so-called satDNA amplification). Remarkably, satDNA amplification increases homogenization, at intragenomic level, and diversification between species, thus constituting the basis for concerted evolution. We suggest a model of satDNA evolution by means of recursive cycles of amplification, degeneration, and rejuvenation, leading to mostly contingent evolutionary pathways where concerted evolution emerges promptly after lineages split.


2021 ◽  
Author(s):  
David P Maison ◽  
Sean B. Cleveland ◽  
Vivek R Nerurkar

Abstract Using genomics, bioinformatics and statistics, herein we demonstrate the effect of statewide and nationwide quarantine on the introduction of SARS-CoV-2 variants of concern (VOC) in Hawai’i. To define the origins of introduced VOC, we analyzed 260 VOC sequences from Hawai’i, and 301,646 VOC sequences worldwide, deposited in the GenBank and global initiative on sharing all influenza data (GISAID), and constructed phylogenetic trees. The trees define the most recent common ancestor as the origin. Further, the multiple sequence alignment used to generate the phylogenetic trees identified the consensus single nucleotide polymorphisms in the VOC genomes. These consensus sequences allow for VOC comparison and identification of mutations of interest in relation to viral immune evasion and host immune activation. Of note is the P71L substitution within the E protein, the protein sensed by TLR2 to produce cytokines, found in the B.1.351 VOC may diminish the efficacy of some vaccines. Based on the phylogenetic trees, the B.1.1.7, B.1.351, B.1.427, and B.1.429 VOC have been introduced in Hawai'i multiple times since December 2020 from several definable geographic regions. From the first worldwide report of VOC in GenBank and GISAID, to the first arrival of VOC in Hawai’i, averages 320 days with quarantine, and 132 days without quarantine. As such, the effect of quarantine is shown to significantly affect the time to arrival of VOC in Hawai’i, both during and following quarantine. Further, the collective 2020 quarantine of 43-states in the United States demonstrates a profound impact in delaying the arrival of VOC in states that did not practice quarantine, such as Utah. Our data demonstrates that at least 76% of all definable SARS-CoV-2 VOC have entered Hawai’i from California, with the B.1.351 variant in Hawai’i originating exclusively from the United Kingdom. These data provide a foundation for policy-makers and public-health officials to apply precision public health genomics to real-world policies such as mandatory screening and quarantine.


Sign in / Sign up

Export Citation Format

Share Document