scholarly journals A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records

2016 ◽  
Vol 23 (5) ◽  
pp. 934-941 ◽  
Author(s):  
Tasnia Tahsin ◽  
Davy Weissenbacher ◽  
Robert Rivera ◽  
Rachel Beard ◽  
Mari Firago ◽  
...  

Abstract Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles.

2016 ◽  
pp. 477-484
Author(s):  
Li Zhang

As a public service platform of geographic information, National Geospatial Data Center (NGDC) can provide geospatial metadata services for data producers and data users. This paper firstly analyzes the problems in the development and maintenance of geospatial metadata deployment system. Then it describes and analyzes the characteristics of PHP framework and the advantages of developing content management system (CMS) with it. Finally, how to design geospatial metadata deployment system based on PHP framework is discussed in this paper.


2019 ◽  
Vol 14 (8) ◽  
pp. 523-536
Author(s):  
Maryam Saleh ◽  
Jamileh Nowroozi ◽  
Fatemeh Fotouhi ◽  
Behrokh Farahmand

Aim: The present study evaluated the structural changes resulting from the interaction between a recombinant influenza A virus M2 protein and aluminum hydroxide adjuvant to investigate the antigen for further immunological studies. Materials & methods: Membrane protein II was produced from the H1N1 subtype of human influenza A virus. The interaction between M2 protein and alum inum hydroxide adjuvant was evaluated by physicochemical techniques including scanning electron microscope, UV-Vis spectra, Fourier-transform infrared spectroscopy and circular dichroism spectroscopy. Results: Physicochemical methods showed high-level protein adsorption and accessibility to the effective parts of the protein. Conclusion: It was concluded that M2 protein secondary structural perturbations, including the α-helix-to-β-sheet transition, enhanced its mechanical properties toward adsorption.


2019 ◽  
Author(s):  
Fransiskus Xaverius Ivan ◽  
Chee Keong Kwoh

AbstractBackgroundInfluenza A virus (IAV) poses threats to human health and life. Many individual studies have been carried out in mice to uncover the viral factors responsible for the virulence of IAV infections. Virus adaptation through serial lung-to-lung passaging and reverse genetic engineering and mutagenesis approaches have been widely used in the studies. Nonetheless, a single study may not provide enough confident about virulence factors, hence combining several studies for a meta-analysis is desired to provide better views.MethodsVirulence information of IAV infections and the corresponding virus and mouse strains were documented from literature. Using the mouse lethal dose 50, time series of weight loss or percentage of survival, the virulence of the infections was classified as avirulent or virulent for two-class problems, and as low, intermediate or high for three-class problems. On the other hand, protein sequences were decoded from the corresponding IAV genomes or reconstructed manually from other proteins according to mutations mentioned in the related literature. IAV virulence models were then learned from various datasets containing IAV proteins whose amino acids at their aligned position and the corresponding two-class or three-class virulence labels. Three proven rule-based learning approaches, i.e., OneR, JRip and PART, and additionally random forest were used for modelling, and top protein sites and synergy between protein sites were identified from the models.ResultsMore than 500 records of IAV infections in mice whose viral proteins could be retrieved were documented. The BALB/C and C57BL/6 mouse strains and the H1N1, H3N2 and H5N1 viruses dominated the infection records. PART models learned from full or subsets of datasets achieved the best performance, with moderate averaged model accuracies ranged from 65.0% to 84.4% and from 54.0% to 66.6% for two-class and three-class datasets that utilized all records of aligned IAV proteins, respectively. Their averaged accuracies were comparable or even better than the averaged accuracies of random forest models and should be preferred based on the Occam’s razor principle. Interestingly, models based on a dataset that included all IAV strains achieved a better averaged accuracy when host information was taken into account. For model interpretation, we observed that although many sites in HA were highly correlated with virulence, PART models based on sites in PB2 could compete against and were often better than PART models based on sites in HA. Moreover, PART had a high preference to include sites in PB2 when models were learned from datasets containing concatenated alignments of all IAV proteins. Several sites with a known contribution to virulence were found as the top protein sites, and site pairs that may synergistically influence virulence were also uncovered.ConclusionModelling the virulence of IAV infections is a challenging problem. Rule-based models generated using only viral proteins are useful for its advantage in interpretation, but only achieve moderate performance. Development of more advanced machine learning approaches that learn models from features extracted from both viral and host proteins must be considered for future works.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Fransiskus Xaverius Ivan ◽  
Chee Keong Kwoh

Abstract Background Influenza A virus (IAV) poses threats to human health and life. Many individual studies have been carried out in mice to uncover the viral factors responsible for the virulence of IAV infections. Nonetheless, a single study may not provide enough confident about virulence factors, hence combining several studies for a meta-analysis is desired to provide better views. For this, we documented more than 500 records of IAV infections in mice, whose viral proteins could be retrieved and the mouse lethal dose 50 or alternatively, weight loss and/or survival data, was/were available for virulence classification. Results IAV virulence models were learned from various datasets containing aligned IAV proteins and the corresponding two virulence classes (avirulent and virulent) or three virulence classes (low, intermediate and high virulence). Three proven rule-based learning approaches, i.e., OneR, JRip and PART, and additionally random forest were used for modelling. PART models achieved the best performance, with moderate average model accuracies ranged from 65.0 to 84.4% and from 54.0 to 66.6% for the two-class and three-class problems, respectively. PART models were comparable to or even better than random forest models and should be preferred based on the Occam’s razor principle. Interestingly, the average accuracy of the models was improved when host information was taken into account. For model interpretation, we observed that although many sites in HA were highly correlated with virulence, PART models based on sites in PB2 could compete against and were often better than PART models based on sites in HA. Moreover, PART had a high preference to include sites in PB2 when models were learned from datasets containing the concatenated alignments of all IAV proteins. Several sites with a known contribution to virulence were found as the top protein sites, and site pairs that may synergistically influence virulence were also uncovered. Conclusion Modelling IAV virulence is a challenging problem. Rule-based models generated using viral proteins are useful for its advantage in interpretation, but only achieve moderate performance. Development of more advanced approaches that learn models from features extracted from both viral and host proteins shall be considered for future works.


Author(s):  
Li Zhang

As a public service platform of geographic information, National Geospatial Data Center (NGDC) can provide geospatial metadata services for data producers and data users. This paper firstly analyzes the problems in the development and maintenance of geospatial metadata deployment system. Then it describes and analyzes the characteristics of PHP framework and the advantages of developing content management system(CMS) with it. Finally, how to design geospatial metadata deployment system based on PHP framework is discussed in this paper.


2021 ◽  
Vol 17 ◽  
pp. 117693432110030
Author(s):  
Hoa Thanh Le ◽  
Phuc-Chau Do ◽  
Ly Le

A high level of mutation enables the influenza A virus to resist antibiotics previously effective against the influenza A virus. A portion of the structure of hemagglutinin HA is assumed to be well-conserved to maintain its role in cellular fusion, and the structure tends to be more conserved than sequence. We designed peptide inhibitors to target the conserved residues on the HA surface, which were identified based on structural alignment. Most of the conserved and strongly similar residues are located in the receptor-binding and esterase regions on the HA1 domain In a later step, fragments of anti-HA antibodies were gathered and screened for the binding ability to the found conserved residues. As a result, Methionine amino acid got the best docking score within the −2.8 Å radius of Van der Waals when it is interacting with Tyrosine, Arginine, and Glutamic acid. Then, the binding affinity and spectrum of the fragments were enhanced by grafting hotspot amino acid into the fragments to form peptide inhibitors. Our peptide inhibitor was able to form in silico contact with a structurally conserved region across H1, H2, and H3 HA, with the binding site at the boundary between HA1 and HA2 domains, spreading across different monomers, suggesting a new target for designing broad-spectrum antibody and vaccine. This research presents an affordable method to design broad-spectrum peptide inhibitors using fragments of an antibody as a scaffold.


1978 ◽  
Vol 147 (2) ◽  
pp. 531-540 ◽  
Author(s):  
J Lindenmann ◽  
E Deuel ◽  
S Fanconi ◽  
O Haller

A strain of avian influenza A virus was adapted to grow in mouse peritoneal macrophages in vitro. The adapted strain, called M-TUR, induced a marked cytopathic effect in macrophages from susceptible mice. Mice homozygous (A2G) or heterozygous (F1 hybrids between A2G and several susceptible strains) for the gene Mx, shown previously to induce a high level of resistance towards lethal challenge by a number of myxoviruses in vivo, yielded peritoneal macrophages which were not affected by M-TUR. Peritoneal macrophages could be classified as resistant or susceptible to M-TUR without sacrificing the cell donor. Backcrosses were arranged between (A2G X A/J)F1 and A/J mice. 64 backcross animals could be tested individually both for resistance of their macrophages in vitro after challenge with M-TUR, and for resistance of the whole animal in vivo after challenge with NWS (a neurotropic variant of human influenza A virus). Macrophages from 36 backcross mice were classified as susceptible, and all of these mice died after challenge. Macrophages from 28 mice were classified as resistant, and 26 mice survived challenge. We conclude that resistance of macrophages and resistance of the whole animal are two facets of the same phenomenon.


Retrovirology ◽  
2009 ◽  
Vol 6 (1) ◽  
pp. 38 ◽  
Author(s):  
Eva-K Pauli ◽  
Mirco Schmolke ◽  
Henning Hofmann ◽  
Christina Ehrhardt ◽  
Egbert Flory ◽  
...  

2001 ◽  
Vol 75 (23) ◽  
pp. 11773-11780 ◽  
Author(s):  
Darwyn Kobasa ◽  
Krisna Wells ◽  
Yoshihiro Kawaoka

ABSTRACT The 1957 human pandemic strain of influenza A virus contained an avian virus hemagglutinin (HA) and neuraminidase (NA), both of which acquired specificity for the human receptor,N-acetylneuraminic acid linked to galactose of cellular glycoconjugates via an α2-6 bond (NeuAcα2-6Gal). Although the NA retained considerable specificity for NeuAcα2-3Gal, its original substrate in ducks, it lost the ability to support viral growth in the duck intestine, suggesting a growth-restrictive change other than a shift in substrate specificity. To test this possibility, we generated a panel of reassortant viruses that expressed the NA genes of human H2N2 viruses isolated from 1957 to 1968 with all other genes from the avian virus A/duck/Hong Kong/278/78 (H9N2). Only the NA of A/Singapore/1/57 supported efficient viral growth in the intestines of orally inoculated ducks. The growth-supporting capacity of the NA correlated with a high level of enzymatic activity, comparable to that found to be associated with avian virus NAs. The specific activities of the A/Ann Arbor/6/60 and A/England/12/62 NAs, which showed greatly restricted abilities to support viral growth in ducks, were only 8 and 5%, respectively, of the NA specific activity for A/Singapore/1/57. Using chimeric constructs based on A/Singapore/1/57 and A/England/12/62 NAs, we localized the determinants of high specific NA activity to a region containing six amino acid substitutions in A/England/12/62: Ser331→Arg, Asp339→Asn, Asn367→Ser, Ser370→Leu, Asn400→Ser, and Pro431→Glu. Five of these six residues (excluding Asn400) were required and sufficient for the full specific activity of the A/Singapore/1/57 NA. Thus, in addition to a change in substrate specificity, a reduction in high specific activity may be required for the adaptation of avian virus NAs to growth in humans. This change is likely needed to maintain an optimal balance between NA activity and the lower affinity shown by human virus HAs for their cellular receptor.


Planta Medica ◽  
2012 ◽  
Vol 78 (11) ◽  
Author(s):  
A Derksen ◽  
W Hafezi ◽  
A Hensel ◽  
J Kühn

Sign in / Sign up

Export Citation Format

Share Document