Phylogenetic measures of indel rate variation among the HIV-1 group M subtypes

Mapping Intimacies ◽

10.1101/494344 ◽

2018 ◽

Cited By ~ 1

Author(s):

John Palmer ◽

Art Poon

Keyword(s):

Sequence Data ◽

Poisson Model ◽

Nucleotide Composition ◽

Rate Variation ◽

Flanking Sequence ◽

Nucleotide Substitutions ◽

Virus Surface ◽

Variable Regions ◽

Subtype B ◽

Hiv 1

The transmission and pathogenesis of human immunodeficiency virus type 1 (HIV-1) is disproportionately influenced by evolution in the five variable regions of the virus surface envelope glycoprotein (gp120). Insertions and deletions (indels) are a significant source of evolutionary change in these regions. However, the influx of indels relative to nucleotide substitutions has not yet been quantified through a comparative analysis of HIV-1 sequence data. Here we develop and report results from a phylogenetic method to estimate indel rates for the gp120 variable regions across five major subtypes and two circulating recombinant forms (CRFs) of HIV-1 group M. We processed over 26,000 published HIV-1 gp120 sequences, from which we extracted 6,605 sequences for phylogenetic analysis. In brief, our method employs maximum likelihood to reconstruct phylogenies scaled in time and fits a Poisson model to the observed distribution of indels between closely related pairs of sequences in the tree (cherries). The rate estimates ranged from 3.0e-5 to 1.5e-3 indels/nt/year and varied significantly among variable regions and subtypes. Indel rates were significantly lower in the region encoding variable loop V3, and also lower for HIV-1 subtype B relative to other subtypes. We also found that variable loops V1, V2 and V4 tended to accumulate significantly longer indels. Further, we observed that the nucleotide composition of indel sequences was significantly distinct from that of the flanking sequence in HIV-1 gp120. Indels affected potential N-linked glycosylation sites substantially more often in V1 and V2 than expected by chance, which is consistent with positive selection on glycosylation patterns within these regions of gp120. These results represent the first comprehensive measures of indel rates in HIV-1 gp120 across multiple subtypes and CRFs, and identifies novel and unexpected patterns for further research in the molecular evolution of HIV-1.

Phylogenetic measures of indel rate variation among the HIV-1 group M subtypes

Virus Evolution ◽

10.1093/ve/vez022 ◽

2019 ◽

Vol 5 (2) ◽

Cited By ~ 1

Author(s):

John Palmer ◽

Art F Y Poon

Keyword(s):

Large Scale ◽

Poisson Model ◽

Purifying Selection ◽

Nucleotide Composition ◽

Rate Variation ◽

Flanking Sequence ◽

Variable Regions ◽

Subtype B ◽

Glycoprotein Gp120 ◽

Hiv 1

Abstract The transmission fitness and pathogenesis of HIV-1 is disproportionately influenced by evolution in the five variable regions (V1–V5) of the surface envelope glycoprotein (gp120). Insertions and deletions (indels) are a significant source of evolutionary change in these regions. However, the rate and composition of indels has not yet been quantified through a large-scale comparative analysis of HIV-1 sequences. Here, we develop and report results from a phylogenetic method to estimate indel rates for the gp120 variable regions across five major subtypes and two circulating recombinant forms (CRFs) of HIV-1 group M. We processed over 26,000 published HIV-1 gp120 sequences, from which we extracted 6,605 sequences for phylogenetic analysis. We reconstructed time-scaled phylogenies by maximum likelihood and fit a binomial-Poisson model to the observed distribution of indels between closely related pairs of sequences in each tree (cherries). By focusing on cherries in each tree, we obtained phylogenetically independent indel reconstructions, and the shorter time scales in cherries reduced the bias due to purifying selection. Rate estimates ranged from 3.0×10−5 to 1.5×10−3 indels/nt/year and varied significantly among variable regions and subtypes. Indel rates were significantly lower in V3 relative to V1, and were also lower in HIV-1 subtype B relative to the 01_AE reference. We also found that V1, V2, and V4 tended to accumulate significantly longer indels. Furthermore, we observed that the nucleotide composition of indels was distinct from the flanking sequence, with higher frequencies of G and lower frequencies of T. Indels affected N-linked glycosylation sites more often in V1 and V2 than expected by chance, consistent with positive selection on glycosylation patterns within these regions. These results represent the first comprehensive measures of indel rates in HIV-1 gp120 across multiple subtypes and CRFs, and identifies novel and unexpected patterns for further research in the molecular evolution of HIV-1.

Detection of novel HIV-1 drug resistance mutations by support vector analysis of deep sequence data and experimental validation

10.1101/804781 ◽

2019 ◽

Author(s):

Mariano Avino ◽

Emmanuel Ndashimye ◽

Daniel J. Lizotte ◽

Abayomi S. Olabode ◽

Richard M. Gibson ◽

...

Keyword(s):

Drug Resistance ◽

Drug Treatment ◽

Treatment Outcomes ◽

Sequence Data ◽

Wild Type Virus ◽

Support Vector ◽

Wild Type ◽

Subtype B ◽

The World ◽

Hiv 1

AbstractThe global HIV-1 pandemic comprises many genetically divergent subtypes. Most of our understanding of drug resistance in HIV-1 derives from subtype B, which predominates in North America and western Europe. However, about 90% of the pandemic represents non-subtype B infections. Here, we use deep sequencing to analyze HIV-1 from infected individuals in Uganda who were either treatment-naïve or who experienced virologic failure on ART without the expected patterns of drug resistance. Our objective was to detect potentially novel associations between mutations in HIV-1 integrase and treatment outcomes in Uganda, where most infections are subtypes A or D. We retrieved a total of 380 archived plasma samples from patients at the Joint Clinical Research Centre (Kampala), of which 328 were integrase inhibitor-naïve and 52 were raltegravir (RAL)-based treatment failures. Next, we developed a bioinformatic pipeline for alignment and variant calling of the deep sequence data obtained from these samples from a MiSeq platform (Illumina). To detect associations between within-patient polymorphisms and treatment outcomes, we used a support vector machine (SVM) for feature selection with multiple imputation to account for partial reads and low quality base calls. Candidate point mutations of interest were experimentally introduced into the HIV-1 subtype B NL4-3 backbone to determine susceptibility to RAL in U87.CD4.CXCR4 cells. Finally, we carried out replication capacity experiments with wild-type and mutant viruses in TZM-bl cells in the presence and absence of RAL. Our analyses not only identified the known major mutation N155H and accessory mutations G163R and V151I, but also novel mutations I203M and I208L as most highly associated with RAL failure. The I203M and I208L mutations resulted in significantly decreased susceptibility to RAL (44.0-fold and 54.9-fold, respectively) compared to wild-type virus (EC50=0.32 nM), and may represent novel pathways of HIV-1 resistance to modern treatments.Author summaryThere are many different types of HIV-1 around the world. Most of the research on how HIV-1 can become resistant to drug treatment has focused on the type (B) that is the most common in high-income countries. However, about 90% of infections around the world are caused by a type other than B. We used next-generation sequencing to analyze samples of HIV-1 from patients in Uganda (mostly infected by types A and D) for whom drug treatment failed to work, and whose infections did not fit the classic pattern of adaptation based on B. Next, we used machine learning to detect mutations in these virus populations that could explain the treatment outcomes. Finally, we experimentally added two candidate mutations identified by our analysis to a laboratory strain of HIV-1 and confirmed that they conferred drug resistance to the virus. Our study reveals new pathways that other types of HIV-1 may use to evolve resistance to drugs that make up the current recommended treatment for newly diagnosed individuals.

Update on diversity and distribution of HIV-1 subtypes in Yunnan province

Epidemiology and Infection ◽

10.1017/s0950268812002713 ◽

2013 ◽

Vol 141 (11) ◽

pp. 2418-2427 ◽

Cited By ~ 10

Author(s):

Y.-Z. SU ◽

Y.-L. MA ◽

M.-H. JIA ◽

X. HE ◽

L. YANG ◽

...

Keyword(s):

Yunnan Province ◽

Sequence Data ◽

Distribution Data ◽

Subtype C ◽

Subtype B ◽

Tree Construction ◽

Complex Population ◽

And Control ◽

Hiv 1 ◽

Hiv Subtypes

SUMMARYThe aim of this study was to characterize updated HIV subtypes in Yunnan to determine their origins and distribution within the population. RT–PCR of both thegagandenvgenes were sequenced from Yunnan province inhabitants newly diagnosed with HIV-1. Sequence data from 290 samples were used for statistical analysis of subtype distribution and phylogenetic tree construction. Distribution data were adjusted to account for different geographical distributions of HIV-1 subtypes in the population. Phylogenetic analysis revealed six HIV-1 subtypes in Yunnan, including eight types of unique recombination forms (URFs). The most prevalent subtypes in this province, CRF07_BC (18·9%), CRF08_BC (39·1%), CRF01_AE (22·4%), and URFs (subtype C, 5·9% and subtype B, 4·5%), were all recombinants. We found significant differences in the distribution of these HIV-1 subtypes not only geographically, but also between various ethnic groups and with respect to transmission routes. Our findings indicate a complex population of HIV-1 subtypes, URFs, and recombinant subtypes in Yunnan province. This diversity could make the prevention and control of HIV infection in Yunnan more difficult due to the possibility of virus recombination or infection by multiple subtypes.

An Evolutionary Model-Based Approach To Quantify the Genetic Barrier to Drug Resistance in Fast-Evolving Viruses and Its Application to HIV-1 Subtypes and Integrase Inhibitors

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.00539-19 ◽

2019 ◽

Vol 63 (8) ◽

Cited By ~ 2

Author(s):

Kristof Theys ◽

Pieter J. K. Libin ◽

Kristel Van Laethem ◽

Ana B. Abecasis

Keyword(s):

Sequence Data ◽

Evolutionary Model ◽

Integrase Inhibitors ◽

Genetic Barrier ◽

Viral Pathogens ◽

Model Based ◽

Subtype B ◽

Genetic Potential ◽

The Impact ◽

Hiv 1

ABSTRACT Viral pathogens causing global disease burdens are often characterized by high rates of evolutionary changes. The extensive viral diversity at baseline can shorten the time to escape from therapeutic or immune selective pressure and alter mutational pathways. The impact of genotypic background on the barrier to resistance can be difficult to capture, particularly for agents in experimental stages or that are recently approved or expanded into new patient populations. We developed an evolutionary model-based counting method to quickly quantify the population genetic potential to resistance and assess population differences. We demonstrate its applicability to HIV-1 integrase inhibitors, as their increasing use globally contrasts with limited availability of non-B subtype resistant sequence data and corresponding knowledge gap. A large sequence data set encompassing most prevailing HIV-1 subtypes and resistance-associated mutations of currently approved integrase inhibitors was investigated. A complex interplay between codon predominance, polymorphisms, and associated evolutionary costs resulted in a subtype-dependent varied genetic potential for 15 resistance mutations against integrase inhibitors. While we confirm the lower genetic barrier of subtype B for G140S, we convincingly discard a similar effect previously suggested for G140C. A supplementary analysis for HIV-1 reverse transcriptase inhibitors identified a lower genetic barrier for K65R in subtype C through differential codon usage not reported before. To aid evolutionary interpretations of genomic differences for antiviral strategies, we advanced existing counting methods with increased sensitivity to identify subtype dependencies of resistance emergence. Future applications include novel HIV-1 drug classes or vaccines, as well as other viral pathogens.

Genome-scale rates of evolutionary change in bacteria

10.1101/069492 ◽

2016 ◽

Cited By ~ 2

Author(s):

Sebastian Duchêne ◽

Kathryn E. Holt ◽

François-Xavier Weill ◽

Simon Le Hello ◽

Jane Hawkey ◽

...

Keyword(s):

Evolutionary Dynamics ◽

Sequence Data ◽

Temporal Structure ◽

Evolutionary Rates ◽

Whole Genome Sequence ◽

Rate Variation ◽

Data Sets ◽

Inverse Association ◽

Nucleotide Substitutions ◽

Ecological Processes

ABSTRACTEstimating the rates at which bacterial genomes evolve is critical to understanding major evolutionary and ecological processes such as disease emergence, long-term host-pathogen associations, and short-term transmission patterns. The surge in bacterial genomic data sets provides a new opportunity to estimate these rates and reveal the factors that shape bacterial evolutionary dynamics. For many organisms estimates of evolutionary rate display an inverse association with the time-scale over which the data are sampled. However, this relationship remains unexplored in bacteria due to the difficulty in estimating genome-wide evolutionary rates, which are impacted by the extent of temporal structure in the data and the prevalence of recombination. We collected 36 whole genome sequence data sets from 16 species of bacterial pathogens to systematically estimate and compare their evolutionary rates and assess the extent of temporal structure in the absence of recombination. The majority (28/36) of data sets possessed sufficient clock-like structure to robustly estimate evolutionary rates. However, in some species reliable estimates were not possible even with “ancient DNA” data sampled over many centuries, suggesting that they evolve very slowly or that they display extensive rate variation among lineages. The robustly estimated evolutionary rates spanned several orders of magnitude, from 10−6 to 10−8 nucleotide substitutions site-1 year-1. This variation was largely attributable to sampling time, which was strongly negatively associated with estimated evolutionary rates, with this relationship best described by an exponential decay curve. To avoid potential estimation biases such time-dependency should be considered when inferring evolutionary time-scales in bacteria.

NGlyAlign: an automated library building tool to align highly divergent HIV envelope sequences

BMC Bioinformatics ◽

10.1186/s12859-020-03901-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Elma H. Akand ◽

John M. Murray

Keyword(s):

Mathematical Biology ◽

Immune Surveillance ◽

Variable Region ◽

Virus Envelope ◽

Multiple Sequence ◽

Variable Regions ◽

Subtype B ◽

Glycosylation Sites ◽

Hiv Envelope ◽

Hiv 1

Abstract Background The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences. Results We developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope. Conclusions NGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 .

Impact of Mutations Outside the V3 Region on Coreceptor Tropism Phenotypically Assessed in Patients Infected with HIV-1 Subtype B

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.00743-11 ◽

2011 ◽

Vol 55 (11) ◽

pp. 5078-5084 ◽

Cited By ~ 20

Author(s):

Laura Monno ◽

Annalisa Saracino ◽

Luigia Scudeller ◽

Grazia Punzi ◽

Gaetano Brindicci ◽

...

Keyword(s):

Amino Acid ◽

Statistical Significance ◽

Net Charge ◽

Variable Regions ◽

Coreceptor Tropism ◽

Enhanced Sensitivity ◽

Subtype B ◽

End Stage ◽

The Impact ◽

Hiv 1

ABSTRACTHIV coreceptor tropism (CTR) testing is a prerequisite for prescribing a coreceptor antagonist. CTR is increasingly deduced by analyzing the V3 loop sequence of gp120. We investigated the impact of mutations outside V3 on CTR as determined by the enhanced-sensitivity Trofile assay (ESTA). Paired ESTA and gp120 sequencing (population sequencing; from codon 32 of the conserved C1 to the variable V5 domains) were obtained from 60 antiretroviral treatment (ART)-naïve patients (15 with AIDS) infected with subtype B HIV-1. For gp120 sequence analysis, nucleotide mixtures were considered when the second highest electropherogram peak was >25%; sequences were translated into all possible permutations and classified as X4, dual/mixed (DM), and R5 based on coincident ESTA results. ESTA identified R5 and DM viruses in 72 and 28% of patients, respectively; no pure X4 was labeled. Forty percent of AIDS patients had R5 strains. Thirty-two positions, mostly outside V3, were significantly (P< 0.05) different between R5 and DM sequences. According to multivariate analysis, amino acid changes at 9 and 7 positions within the C1 to C4 and V1 to V5 regions, respectively, maintained a statistical significance, as did the net charge of V3 and C4. When analyzing only R5 sequences, 6 positions in the variable regions were found which, along with the V4 net charge, were significantly different for sequences from early- and end-stage disease patients. This study identifies specific amino acid changes outside V3 which contribute to CTR. Extending the analysis to include pure X4 and increasing the sample size would be desirable to define gp120 variables/changes which should be included in predictive algorithms.

Sequence Length of HIV-1 Subtype B Increases over Time: Analysis of a Cohort of Patients with Hemophilia over 30 Years

Viruses ◽

10.3390/v13050806 ◽

2021 ◽

Vol 13 (5) ◽

pp. 806

Author(s):

Young-Keol Cho ◽

Jung-Eun Kim ◽

Brian T. Foley

Keyword(s):

Direct Sequencing ◽

Sequence Length ◽

Coding Region ◽

Rt Pcr ◽

Time Analysis ◽

Variable Regions ◽

Subtype B ◽

Signature Pattern ◽

Hiv 1 ◽

Over Time

We aimed to investigate whether the sequence length of HIV-1 increases over time. We performed a longitudinal analysis of full-length coding region sequences (FLs) during an HIV-1 outbreak among patients with hemophilia and local controls infected with the Korean subclade B of HIV-1 (KSB). Genes were amplified by overlapping RT-PCR or nested PCR and subjected to direct sequencing. Overall, 141 FLs were sequentially determined over 30 years in 62 KSB-infected patients. Phylogenetic analysis indicated that within KSB, two FLs from plasma donors O and P comprised two clusters, together with 8 and 12 patients with hemophilia, respectively. Signature pattern analysis of the KSB of HIV-1 revealed 91 signature nucleotide residues (1.1%). In total, 48 and 43 signature nucleotides originated from clusters O and P, respectively. Six positions contained 100% specific nucleotide(s) in clusters O and P. In-depth FL analysis for over 30 years indicated that the KSB FL significantly increased over time before combination antiretroviral therapy (cART) and decreased with cART. This increase occurred due to the significant increase in env and nef genes, originating in the variable regions of both genes. The increase in sequence length of HIV-1 over time suggests an evolutionary direction.

Molecular Epidemiology of HIV-1 virus in Egypt: A major change in the circulating subtypes

Current HIV Research ◽

10.2174/1570162x19666210805091742 ◽

2021 ◽

Vol 19 ◽

Author(s):

Ahmed Noby Amer ◽

Ahmed Gaballah ◽

Rasha Emad ◽

Abeer Ghazal ◽

Nancy Attia

Keyword(s):

Sequence Data ◽

Major Change ◽

Current Status ◽

Recombination Rates ◽

Mena Region ◽

High Genetic Diversity ◽

Subtype B ◽

Molecular Phylogenetic ◽

Low Prevalence ◽

Hiv 1

Background: Human immunodeficiency virus type 1 (HIV-1) is characterized by high genetic diversity due to its high-mutation and recombination rates. Although, there is an increasing prevalence of circulating recombinant forms (CRFs) worldwide. Subtype B is still recognized as the predominant subtype in the Middle East and North Africa (MENA) region. There is a limited sampling of HIV in this region due to its low prevalence. The main purpose of this study is to provide a summary of the current status of the resident HIV subtypes and their distribution among Egyptian patients. Methodology: Forty-five HIV-1 patients were included in this study. Partial pol gene covering the protease (PR) and reverse transcriptase (RT) was successfully amplified in 21 HIV patients using nested PCR of cDNA of the viral genomic RNA, then sequenced. The sequence data were used for viral HIV-1 subtyping by 5 online subtyping tools: NCBI viral genotyping tool, Stanford University HIV database (HIVDB) subtyping program, REGA tool, Context-based modeling for expeditious typing (COMET) tool, and Recombinant identification program (RIP) tool. The final subtype assignment was based on molecular phylogenetic analysis. Results: Unexpectedly, non-B subtypes are dominating with the most common circulating one is CRF02_AG (57.1%) followed by subtype B (14.3%), subtype BG recombinant (9.5%), CRF35_AD (9.5%), subtype A1 and CRF06_cpx (4.8 % each). Conclusion: To the best of our knowledge, this is the first study to tackle HIV-1 subtyping among the group of HIV-1 patients in Egypt. CRF02_AG is the most prevalent subtype in Egypt.

Sequence Length of HIV-1 Subtype B Increases Over Time: Analysis of a Cohort of Patients With Hemophilia Over 30 Years

10.20944/preprints202104.0217.v1 ◽

2021 ◽

Author(s):

Young-Keol Cho ◽

Jung-Eun Kim ◽

Brian Foley

Keyword(s):

Direct Sequencing ◽

Sequence Length ◽

Coding Region ◽

Rt Pcr ◽

Variable Regions ◽

Subtype B ◽

Combined Antiretroviral Therapy ◽

Signature Pattern ◽

Hiv 1 ◽

Over Time

We aimed to investigate whether the sequence length of HIV-1 increases over time. A longitudinal analysis of full-length coding region sequences (FLs) during an HIV-1 outbreak among pa-tients with hemophilia and local controls infected with the Korean subclade B of HIV-1 (KSB) was performed. Genes were amplified by overlapping RT-PCR or nested PCR and subjected to direct sequencing. Overall, 141 FLs were sequentially determined over 30 years in 62 KSB-infected patients. Phylogenetic analysis indicated that within KSB, two FLs from plasma donors O and P comprised two clusters together with 8 and 12 patients with hemophilia, respectively. Signature pattern analysis for the KSB of HIV-1 revealed 91 signature nucleotide residues (1.05%). In total, 48 and 43 signature nucleotides originated from clusters O and P, respectively. Only six positions contained 100% specific nucleotide(s) in clusters O and P. Additionally, in-depth FL analysis over 30 years indicates that the KSB FL significantly increased over time before combined antiretroviral therapy (cART) and decreased with cART. The increase occurred due to a significant increase in env and nef genes, originating in the variable regions of both genes. The increase in the sequence length of HIV-1 over time suggests that it has an evolutionary direction.