A Study on Host Tropism Determinants of Influenza Virus Using Machine Learning

2020 ◽  
Vol 15 (2) ◽  
pp. 121-134 ◽  
Author(s):  
Eunmi Kwon ◽  
Myeongji Cho ◽  
Hayeon Kim ◽  
Hyeon S. Son

Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species. Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers. Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts. Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.


2001 ◽  
Vol 75 (17) ◽  
pp. 8127-8136 ◽  
Author(s):  
Daniel R. Perez ◽  
Ruben O. Donis

ABSTRACT Influenza A virus expresses three viral polymerase (P) subunits—PB1, PB2, and PA—all of which are essential for RNA and viral replication. The functions of P proteins in transcription and replication have been partially elucidated, yet some of these functions seem to be dependent on the formation of a heterotrimer for optimal viral RNA transcription and replication. Although it is conceivable that heterotrimer subunit interactions may allow a more efficient catalysis, direct evidence of their essentiality for viral replication is lacking. Biochemical studies addressing the molecular anatomy of the P complexes have revealed direct interactions between PB1 and PB2 as well as between PB1 and PA. Previous studies have shown that the N-terminal 48 amino acids of PB1, termed domain α, contain the residues required for binding PA. We report here the refined mapping of the amino acid sequences within this small region of PB1 that are indispensable for binding PA by deletion mutagenesis of PB1 in a two-hybrid assay. Subsequently, we used site-directed mutagenesis to identify the critical amino acid residues of PB1 for interaction with PA in vivo. The first 12 amino acids of PB1 were found to constitute the core of the interaction interface, thus narrowing the previous boundaries of domain α. The role of the minimal PB1 domain α in influenza virus gene expression and genome replication was subsequently analyzed by evaluating the activity of a set of PB1 mutants in a model reporter minigenome system. A strong correlation was observed between a functional PA binding site on PB1 and P activity. Influenza viruses bearing mutant PB1 genes were recovered using a plasmid-based influenza virus reverse genetics system. Interestingly, mutations that rendered PB1 unable to bind PA were either nonviable or severely growth impaired. These data are consistent with an essential role for the N terminus of PB1 in binding PA, P activity, and virus growth.



mBio ◽  
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Huihui Kong ◽  
David F. Burke ◽  
Tiago Jose da Silva Lopes ◽  
Kosuke Takada ◽  
Masaki Imai ◽  
...  

ABSTRACT Since the emergence of highly pathogenic avian influenza viruses of the H5 subtype, the major viral antigen, hemagglutinin (HA), has undergone constant evolution, resulting in numerous genetic and antigenic (sub)clades. To explore the consequences of amino acid changes at sites that may affect the antigenicity of H5 viruses, we simultaneously mutated 17 amino acid positions of an H5 HA by using a synthetic gene library that, theoretically, encodes all combinations of the 20 amino acids at the 17 positions. All 251 mutant viruses sequenced possessed ≥13 amino acid substitutions in HA, demonstrating that the targeted sites can accommodate a substantial number of mutations. Selection with ferret sera raised against H5 viruses of different clades resulted in the isolation of 39 genotypes. Further analysis of seven variants demonstrated that they were antigenically different from the parental virus and replicated efficiently in mammalian cells. Our data demonstrate the substantial plasticity of the influenza virus H5 HA protein, which may lead to novel antigenic variants. IMPORTANCE The HA protein of influenza A viruses is the major viral antigen. In this study, we simultaneously introduced mutations at 17 amino acid positions of an H5 HA expected to affect antigenicity. Viruses with ≥13 amino acid changes in HA were viable, and some had altered antigenic properties. H5 HA can therefore accommodate many mutations in regions that affect antigenicity. The substantial plasticity of H5 HA may facilitate the emergence of novel antigenic variants.



Author(s):  
Israa Elbashir ◽  
Heba Al Khatib ◽  
Hadi Yassine

Background: Influenza virus is a major cause of respiratory infections worldwide. Besides the common respiratory symptoms, namouras cases with gastrointestinal symptoms have been reported. Moreover, influenza virus has been detected in feces of up to 20.6 % of influenza-infected patients. Therefore, direct infection of intestinal cells with influenza virus is suspected; however, the mechanism of this infection has not been explored. AIM: To investigate influenza virus replication, cellular responses to infection, and virus evolution following serial infection in human Caucasian colon adenocarcinoma cells (Caco-2 cells). Method: Two influenza A subtypes (A/H3N2 and A/H1N1pdm 09) and one influenza B virus (B/Yamagata) were serially passaged in Caco-2. Quantitative PCR was used to study hormones and cytokines expression following infection. Deep sequencing analysis of viral genome was used to assess the virus evolution. Results: The replication capacity of the three viruses was maintained throughout 12 passages, with H3N2 virus being the fastest in adaptation. The expression of hormone and cytokines in Caco-2 cells was considerably different between the viruses and among the passages, however, a pattern of induction was observed at the late phase of infection. Deep sequencing analysis revealed a few amino acid substitutions in the HA protein of H3N2 and H1N1 viruses, mostly in the antigenic site. Moreover, virus evolution at the quasispecies level based on HA protein revealed that H3N2 and H1N1 harbored more diverse virus populations when compared to IBV, indicating their higher evolution within Caco-2 cells. Conclusion: The findings of this study indicate the possibility of influenza virus replication in intestinal cells. To further explain the gastrointestinal complications of influenza infections in-vivo experiments with different influenza viruses are needed.



2014 ◽  
Vol 2 (3) ◽  
pp. 224-228
Author(s):  
Jennifer Tram

Every year the FDA issues a recommendation for the composition of the year’s common influenza vaccine for influenzas A and B.  The FDA can consistently predict the dominance of a particular strand of influenza virus by taking into account previous years’ antigenic characterization percentages. However, the sudden disappearance of dominant antigens and the sudden emergence of drift variants can disrupt this pattern, which questions the effectiveness of that year’s vaccine. Basic Local Alignment Search Tool was used to compare the protein sequences for hemagglutinin and neuraminidase between the strands in the vaccine and the dominant viral strands. This study examined the effectiveness of vaccines from 2000 to 2012, focusing on the transitions between the B/Yamagata and B/Victoria lineages and A/New Caledonia and A/California lineages (H1N1). Between the years 2005 and 2006, dominance of the B/Yamagata lineage, represented by B/Shanghai/361/2002, disappeared almost entirely. For the 2005-2006 flu season, the CDC recommended a B/Shanghai/361/2002 vaccine which expressed a 98% identity to the dominant influenza B hemagglutinin sequence and a 97% identity to the dominant neuraminidase sequence. From 2007 to 2008, the A/New Caledonia virus declined to 34% of cases while the A/Solomon Islands/3/2006 virus increased to 66%. The A/New Caledonia/20/99 vaccine effectively expressed a 97% identity to the hemagglutinin sequence of A/Solomon Islands/3/2006 strand and a 98% identity to the neuraminidase sequence. This study demonstrates that from 2000 to 2012, despite drift variants in influenza viruses, the CDC-recommended vaccine effectively matches the hemagglutinin and neuraminidase protein sequences of the dominant viruses.DOI: http://dx.doi.org/10.3126/ijasbt.v2i3.10952 Int J Appl Sci Biotechnol, Vol. 2(3): 224-228  



Author(s):  
Walaa Alkady ◽  
Muhammad Zanaty ◽  
Heba M. Afify

Abstract The coronavirus infection is increasingly evolving to be an international epidemic in 27 countries as a serious respiratory disease. Therefore, the computational biology carrying this virus that correlated with the human population is urgently needed. In this paper, the classification of the human protein sequences of COVID-19 according to the country is applied by machine learning algorithms. The proposed model is based on the distinguishing of 9238 sequences by three stages including data preprocessing, data labeling, and classification. In the first stage, the function of data preprocessing converts the amino acids of COVID-19 protein sequences to eight groups of numbers based on volume and dipole of the amino acids. In the second stage, there are two methods for data labeling of 27 countries from 0 to 26. The first method is based on the selection of one number for each country according to the code number of countries while the second method is based on binary elements only for each country. The classification algorithms are executed to discover different COVID-19 protein sequences according to their countries. The findings are concluded that the accuracy of 100% performed by country based binary labeling method with Linear Regression (LR) or K-Nearest Neighbor (KNN) or Support Vector Machine (SVM) classifiers. Further, it found that the USA with large data records in infection rate has more priority for correct classification compared to other countries with a low data rate. The unbalanced data for COVID-19 protein sequences is considered a major issue, especially the available data in USA represented 76% from a total of 9238 sequences. As a consequence, this proposed model will help as a diagnostic bioinformatics tool for the COVID-19 protein sequences among different countries.



2009 ◽  
Vol 84 (3) ◽  
pp. 1527-1535 ◽  
Author(s):  
Mark L. Reed ◽  
Olga A. Bridges ◽  
Patrick Seiler ◽  
Jeong-Ki Kim ◽  
Hui-Ling Yen ◽  
...  

ABSTRACT While the molecular mechanism of membrane fusion by the influenza virus hemagglutinin (HA) protein has been studied extensively in vitro, the role of acid-dependent HA protein activation in virus replication, pathogenesis, and transmission in vivo has not been characterized. To investigate the biological significance of the pH of activation of the HA protein, we compared the properties of four recombinant viruses with altered HA protein acid stability to those of wild-type influenza virus A/chicken/Vietnam/C58/04 (H5N1) in vitro and in mallards. Membrane fusion by wild-type virus was activated at pH 5.9. Wild-type virus had a calculated environmental persistence of 62 days and caused extensive morbidity, mortality, shedding, and transmission in mallards. An N114K mutation that increased the pH of HA activation by 0.5 unit resulted in decreased replication, genetic stability, and environmental stability. Changes of +0.4 and −0.5 unit in the pH of activation by Y23H and K58I mutations, respectively, reduced weight loss, mortality, shedding, and transmission in mallards. An H24Q mutation that decreased the pH of activation by 0.3 unit resulted in weight loss, mortality, clinical symptoms, and shedding similar to those of the wild type. However, the HA-H241Q virus was shed more extensively into drinking water and persisted longer in the environment. The pH of activation of the H5 HA protein plays a key role in the propagation of H5N1 influenza viruses in ducks and may be a novel molecular factor in the ecology of influenza viruses. The data also demonstrate that H5N1 neuraminidase activity increases the pH of activation of the HA protein in vitro.



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zhixia Teng ◽  
Zitong Zhang ◽  
Zhen Tian ◽  
Yanjuan Li ◽  
Guohua Wang

Abstract Background Amyloids are insoluble fibrillar aggregates that are highly associated with complex human diseases, such as Alzheimer’s disease, Parkinson’s disease, and type II diabetes. Recently, many studies reported that some specific regions of amino acid sequences may be responsible for the amyloidosis of proteins. It has become very important for elucidating the mechanism of amyloids that identifying the amyloidogenic regions. Accordingly, several computational methods have been put forward to discover amyloidogenic regions. The majority of these methods predicted amyloidogenic regions based on the physicochemical properties of amino acids. In fact, position, order, and correlation of amino acids may also influence the amyloidosis of proteins, which should be also considered in detecting amyloidogenic regions. Results To address this problem, we proposed a novel machine-learning approach for predicting amyloidogenic regions, called ReRF-Pred. Firstly, the pseudo amino acid composition (PseAAC) was exploited to characterize physicochemical properties and correlation of amino acids. Secondly, tripeptides composition (TPC) was employed to represent the order and position of amino acids. To improve the distinguishability of TPC, all possible tripeptides were analyzed by the binomial distribution method, and only those which have significantly different distribution between positive and negative samples remained. Finally, all samples were characterized by PseAAC and TPC of their amino acid sequence, and a random forest-based amyloidogenic regions predictor was trained on these samples. It was proved by validation experiments that the feature set consisted of PseAAC and TPC is the most distinguishable one for detecting amyloidosis. Meanwhile, random forest is superior to other concerned classifiers on almost all metrics. To validate the effectiveness of our model, ReRF-Pred is compared with a series of gold-standard methods on two datasets: Pep-251 and Reg33. The results suggested our method has the best overall performance and makes significant improvements in discovering amyloidogenic regions. Conclusions The advantages of our method are mainly attributed to that PseAAC and TPC can describe the differences between amyloids and other proteins successfully. The ReRF-Pred server can be accessed at http://106.12.83.135:8080/ReRF-Pred/.



Biopolymers ◽  
2019 ◽  
Vol 110 (8) ◽  
Author(s):  
Jia‐Feng Yu ◽  
Ang Qu ◽  
Hu‐Cheng Tang ◽  
Fang‐Hua Wang ◽  
Chun‐Ling Wang ◽  
...  


2019 ◽  
Vol 94 (6) ◽  
Author(s):  
Ying Huang ◽  
Simon O. Owino ◽  
Corey J. Crevar ◽  
Donald M. Carter ◽  
Ted M. Ross

ABSTRACT Vaccination is the most effective way to prevent influenza virus infections. However, the diversity of antigenically distinct isolates is a challenge for vaccine development. In order to overcome the antigenic variability and improve the protective efficacy of influenza vaccines, our research group has pioneered the development of computationally optimized broadly reactive antigens (COBRA) for hemagglutinin (HA). Two candidate COBRA HA vaccines, P1 and X6, elicited antibodies with differential patterns of hemagglutination inhibition (HAI) activity against a panel of H1N1 influenza viruses. In order to better understand how these HA antigens elicit broadly reactive immune responses, epitopes in the Cb, Sa, or Sb antigenic sites of seasonal-like and pandemic-like wild-type or COBRA HA antigens were exchanged with homologous regions in the COBRA HA proteins to determine which regions and residues were responsible for the elicited antibody profile. Mice were vaccinated with virus-like particles (VLPs) expressing one of the 12 modified HA antigens (designated V1 to V12), COBRA HA antigens, or wild-type HA antigens. The elicited antisera was assessed for hemagglutination inhibition activity against a panel of historical seasonal-like and pandemic-like H1N1 influenza viruses. Primarily, the pattern of glycosylation sites and residues in the Sa antigenic region, around the receptor binding site (RBS), served as signatures for the elicitation of broadly reactive antibodies by these HA immunogens. Mice were vaccinated with VLPs expressing HA antigens that lacked a glycosylation site at residue 144 and a deleted lysine at position 147 residue were more effective at protecting against morbidity and mortality following infection with pandemic-like and seasonal-like H1N1 influenza viruses. IMPORTANCE There is a great need to develop broadly reactive or universal vaccines against influenza viruses. Advanced, next-generation hemagglutinin (HA) head-based vaccines that elicit protective antibodies against H1N1 influenza viruses have been developed. This study focused on understanding the specific amino acids around the receptor binding site (RBS) that were important in elicitation of these broadly reactive antibodies. Specific glycan sites and amino acids located at the tip of the HA molecule enhanced the elicitation of these broadly reactive antibodies. A better understanding of the HA structures around the RBS will lead to more effective HA immunogens.



Sign in / Sign up

Export Citation Format

Share Document