scholarly journals Meta-analysis on the lethality of influenza A viruses using machine learning approaches

2020 ◽  
Author(s):  
◽  
Rui Yin
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Balamurugan Sadaiappan ◽  
Chinnamani PrasannaKumar ◽  
V. Uthara Nambiar ◽  
Mahendran Subramanian ◽  
Manguesh U. Gauns

AbstractCopepods are the dominant members of the zooplankton community and the most abundant form of life. It is imperative to obtain insights into the copepod-associated bacteriobiomes (CAB) in order to identify specific bacterial taxa associated within a copepod, and to understand how they vary between different copepods. Analysing the potential genes within the CAB may reveal their intrinsic role in biogeochemical cycles. For this, machine-learning models and PICRUSt2 analysis were deployed to analyse 16S rDNA gene sequences (approximately 16 million reads) of CAB belonging to five different copepod genera viz., Acartia spp., Calanus spp., Centropages sp., Pleuromamma spp., and Temora spp.. Overall, we predict 50 sub-OTUs (s-OTUs) (gradient boosting classifiers) to be important in five copepod genera. Among these, 15 s-OTUs were predicted to be important in Calanus spp. and 20 s-OTUs as important in Pleuromamma spp.. Four bacterial s-OTUs Acinetobacter johnsonii, Phaeobacter, Vibrio shilonii and Piscirickettsiaceae were identified as important s-OTUs in Calanus spp., and the s-OTUs Marinobacter, Alteromonas, Desulfovibrio, Limnobacter, Sphingomonas, Methyloversatilis, Enhydrobacter and Coriobacteriaceae were predicted as important s-OTUs in Pleuromamma spp., for the first time. Our meta-analysis revealed that the CAB of Pleuromamma spp. had a high proportion of potential genes responsible for methanogenesis and nitrogen fixation, whereas the CAB of Temora spp. had a high proportion of potential genes involved in assimilatory sulphate reduction, and cyanocobalamin synthesis. The CAB of Pleuromamma spp. and Temora spp. have potential genes accountable for iron transport.


2019 ◽  
Author(s):  
Fransiskus Xaverius Ivan ◽  
Chee Keong Kwoh

AbstractBackgroundInfluenza A virus (IAV) poses threats to human health and life. Many individual studies have been carried out in mice to uncover the viral factors responsible for the virulence of IAV infections. Virus adaptation through serial lung-to-lung passaging and reverse genetic engineering and mutagenesis approaches have been widely used in the studies. Nonetheless, a single study may not provide enough confident about virulence factors, hence combining several studies for a meta-analysis is desired to provide better views.MethodsVirulence information of IAV infections and the corresponding virus and mouse strains were documented from literature. Using the mouse lethal dose 50, time series of weight loss or percentage of survival, the virulence of the infections was classified as avirulent or virulent for two-class problems, and as low, intermediate or high for three-class problems. On the other hand, protein sequences were decoded from the corresponding IAV genomes or reconstructed manually from other proteins according to mutations mentioned in the related literature. IAV virulence models were then learned from various datasets containing IAV proteins whose amino acids at their aligned position and the corresponding two-class or three-class virulence labels. Three proven rule-based learning approaches, i.e., OneR, JRip and PART, and additionally random forest were used for modelling, and top protein sites and synergy between protein sites were identified from the models.ResultsMore than 500 records of IAV infections in mice whose viral proteins could be retrieved were documented. The BALB/C and C57BL/6 mouse strains and the H1N1, H3N2 and H5N1 viruses dominated the infection records. PART models learned from full or subsets of datasets achieved the best performance, with moderate averaged model accuracies ranged from 65.0% to 84.4% and from 54.0% to 66.6% for two-class and three-class datasets that utilized all records of aligned IAV proteins, respectively. Their averaged accuracies were comparable or even better than the averaged accuracies of random forest models and should be preferred based on the Occam’s razor principle. Interestingly, models based on a dataset that included all IAV strains achieved a better averaged accuracy when host information was taken into account. For model interpretation, we observed that although many sites in HA were highly correlated with virulence, PART models based on sites in PB2 could compete against and were often better than PART models based on sites in HA. Moreover, PART had a high preference to include sites in PB2 when models were learned from datasets containing concatenated alignments of all IAV proteins. Several sites with a known contribution to virulence were found as the top protein sites, and site pairs that may synergistically influence virulence were also uncovered.ConclusionModelling the virulence of IAV infections is a challenging problem. Rule-based models generated using only viral proteins are useful for its advantage in interpretation, but only achieve moderate performance. Development of more advanced machine learning approaches that learn models from features extracted from both viral and host proteins must be considered for future works.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Fransiskus Xaverius Ivan ◽  
Chee Keong Kwoh

Abstract Background Influenza A virus (IAV) poses threats to human health and life. Many individual studies have been carried out in mice to uncover the viral factors responsible for the virulence of IAV infections. Nonetheless, a single study may not provide enough confident about virulence factors, hence combining several studies for a meta-analysis is desired to provide better views. For this, we documented more than 500 records of IAV infections in mice, whose viral proteins could be retrieved and the mouse lethal dose 50 or alternatively, weight loss and/or survival data, was/were available for virulence classification. Results IAV virulence models were learned from various datasets containing aligned IAV proteins and the corresponding two virulence classes (avirulent and virulent) or three virulence classes (low, intermediate and high virulence). Three proven rule-based learning approaches, i.e., OneR, JRip and PART, and additionally random forest were used for modelling. PART models achieved the best performance, with moderate average model accuracies ranged from 65.0 to 84.4% and from 54.0 to 66.6% for the two-class and three-class problems, respectively. PART models were comparable to or even better than random forest models and should be preferred based on the Occam’s razor principle. Interestingly, the average accuracy of the models was improved when host information was taken into account. For model interpretation, we observed that although many sites in HA were highly correlated with virulence, PART models based on sites in PB2 could compete against and were often better than PART models based on sites in HA. Moreover, PART had a high preference to include sites in PB2 when models were learned from datasets containing the concatenated alignments of all IAV proteins. Several sites with a known contribution to virulence were found as the top protein sites, and site pairs that may synergistically influence virulence were also uncovered. Conclusion Modelling IAV virulence is a challenging problem. Rule-based models generated using viral proteins are useful for its advantage in interpretation, but only achieve moderate performance. Development of more advanced approaches that learn models from features extracted from both viral and host proteins shall be considered for future works.


2020 ◽  
Author(s):  
Balamurugan Sadaiappan ◽  
Prasannakumar Chinnamani ◽  
Uthara V Nambiar ◽  
Mahendran Subramanian ◽  
Manguesh U Gauns

2021 ◽  
Author(s):  
Taryn M. Lucas ◽  
Chitrak Gupta ◽  
Meghan O. Altman ◽  
Emi Sanchez ◽  
Matthew R. Naticchia ◽  
...  

2019 ◽  

AbstractConsistent codon usage patterns across species was supposed to be observed owing to the degeneracy of genetic code and the conservation of the translation machinery. In fact, however, codon usage vary dramatically among organisms, and the choice difference might also affect downstream protein expressions, structures as well as their functions. It is suggested that different codon usage patterns should encrypt distinct characters for a certain type of organism, and as a result, a series of machine-learning models have been constructed, not only for learning the patterns from certain species, but also for predicting the species based on given patterns. Two gene segments of influenza A virus, hemagglutinin (HA; gene 4) and neuraminidase (NA; gene 6), were so essential for the immune response of their hosts, that the serotypes of the viruses are named after their combinations. They thus become the objects of this study, and those proposed models work quite well on the designated tasks.


2020 ◽  
Author(s):  
Balamurugan Sadaiappan ◽  
Prasannakumar Chinnamani ◽  
Uthara V Nambiar ◽  
Mahendran Subramanian ◽  
Manguesh U Gauns

Sign in / Sign up

Export Citation Format

Share Document