The COMPARE Database: A Public Resource for Allergen Identification, Adapted for Continuous Improvement

Motivation: The availability of databases identifying allergenic proteins via a transparent and consensus-based scientific approach is of prime importance to support the safety review of genetically-modified foods and feeds, and public safety in general. Over recent years, screening for potential new allergens sequences has become more complex due to the exponential increase of genomic sequence information. To address these challenges, an international collaborative scientific group coordinated by the Health and Environmental Sciences Institute (HESI), was tasked to develop a contemporary, adaptable, high-throughput process to build the COMprehensive Protein Allergen REsource (COMPARE) database, a publicly accessible allergen sequence data resource along with bioinformatics analytical tools following guidelines of FAO/WHO and CODEX Alimentarius Commission.Results: The COMPARE process is novel in that it involves the identification of candidate sequences via automated keyword-based sorting algorithm and manual curation of the annotated sequence entries retrieved from public protein sequence databases on a yearly basis; its process is meant for continuous improvement, with updates being transparently documented with each version; as a complementary approach, a yearly key-word based search of literature databases is added to identify new allergen sequences that were not (yet) submitted to protein databases; in addition, comments from the independent peer-review panel are posted on the website to increase transparency of decision making; finally, sequence comparison capabilities associated with the COMPARE database was developed to evaluate the potential allergenicity of proteins, based on internationally recognized guidelines, FAO/WHO and CODEX Alimentarius Commission

Download Full-text

Fine-mapping of genes determining extrafusal fiber properties in murine soleus muscle

Physiological Genomics ◽

10.1152/physiolgenomics.00092.2016 ◽

2017 ◽

Vol 49 (3) ◽

pp. 141-150 ◽

Cited By ~ 9

Author(s):

A. M. Carroll ◽

R. Cheng ◽

E. S. R. Collie-Duguid ◽

C. Meharg ◽

M. E. Scholz ◽

...

Keyword(s):

Candidate Genes ◽

Muscle Fiber ◽

Soleus Muscle ◽

Genomic Sequence ◽

Sequence Data ◽

Mouse Strains ◽

Sequence Information ◽

Type I ◽

Fiber Types ◽

Genome Wide

Muscle fiber cross-sectional area (CSA) and proportion of different fiber types are important determinants of muscle function and overall metabolism. Genetic variation plays a substantial role in phenotypic variation of these traits; however, the underlying genes remain poorly understood. This study aimed to map quantitative trait loci (QTL) affecting differences in soleus muscle fiber traits between the LG/J and SM/J mouse strains. Fiber number, CSA, and proportion of oxidative type I fibers were assessed in the soleus of 334 genotyped female and male mice of the F34generation of advanced intercross lines (AIL) derived from the LG/J and SM/J strains. To increase the QTL detection power, these data were combined with 94 soleus samples from the F2intercross of the same strains. Transcriptome of the soleus muscle of LG/J and SM/J females was analyzed by microarray. Genome-wide association analysis mapped four QTL (genome-wide P < 0.05) affecting the properties of muscle fibers to chromosome 2, 3, 4, and 11. A 1.5-LOD QTL support interval ranged between 2.36 and 4.67 Mb. On the basis of the genomic sequence information and functional and transcriptome data, we identified candidate genes for each of these QTL. The combination of analyses in F2and F34AIL populations with transcriptome and genomic sequence data in the parental strains is an effective strategy for refining QTL and nomination of the candidate genes.

Download Full-text

Faculty Opinions recommendation of A likelihood ratio test of speciation with gene flow using genomic sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.3540959.3240060 ◽

2010 ◽

Author(s):

Nicolas Galtier ◽

Julien Dutheil

Keyword(s):

Gene Flow ◽

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Genomic Sequence ◽

Sequence Data ◽

Ratio Test

Download Full-text

PoGB-Pred: Prediction of Antifreeze Proteins Sequences using Amino Acid Composition with Feature Selection followed by a Sequential based Ensemble Approach

Current Bioinformatics ◽

10.2174/1574893615999200707141926 ◽

2020 ◽

Vol 15 ◽

Author(s):

Affan Alim ◽

Abdul Rafay ◽

Imran Naseem

Keyword(s):

Amino Acid ◽

Dimension Reduction ◽

Protein Identification ◽

Cold Water ◽

Genomic Sequence ◽

Sequence Data ◽

Antifreeze Proteins ◽

Building Blocks ◽

Gradient Boosting ◽

Proposed Model

Background: Proteins contribute significantly in every task of cellular life. Their functions encompass the building and repairing of tissues in human bodies and other organisms. Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze proteins are of prime significance for organisms that live in very cold areas. With the help of these proteins, the cold water organisms can survive below zero temperature and resist the water crystallization process which may cause the rupture in the internal cells and tissues. AFP’s have attracted attention and interest in food industries and cryopreservation. Objective: With the increase in the availability of genomic sequence data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on different structures. A consolidated method is proposed to produce the competitive performance on highly distinct AFP structure. Methods: In this study, we propose to use machine learning-based algorithms Principal Component Analysis (PCA) followed by Gradient Boosting (GB) for antifreeze protein identification. To analyze the performance and validation of the proposed model, various combinations of two segments composition of amino acid and dipeptide are used. PCA, in particular, is proposed to dimension reduction and high variance retaining of data which is followed by an ensemble method named gradient boosting for modelling and classification. Results: The proposed method obtained the superfluous performance on PDB, Pfam and Uniprot dataset as compared with the RAFP-Pred method. In experiment-3, by utilizing only 150 PCA components a high accuracy of 89.63 was achieved which is superior to the 87.41 utilizing 300 significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different dataset such that non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, our proposed method attained high sensitivity of 79.16 which is 12.50 better than state-of-the-art the RAFP-pred method. Conclusion: AFPs have a common function with distinct structure. Therefore, the development of a single model for different sequences often fails to AFPs. A robust results have been shown by our proposed model on the diversity of training and testing dataset. The results of the proposed model outperformed compared to the previous AFPs prediction method such as RAFP-Pred. Our model consists of PCA for dimension reduction followed by gradient boosting for classification. Due to simplicity, scalability properties and high performance result our model can be easily extended for analyzing the proteomic and genomic dataset.

Download Full-text

Special Issue: Genetic Basis of Phenotypic Variation in Drosophila and Other Insects

Genes ◽

10.3390/genes12081212 ◽

2021 ◽

Vol 12 (8) ◽

pp. 1212

Author(s):

J. Spencer Johnston ◽

Carl E. Hjelmen

Keyword(s):

Next Generation Sequencing ◽

Genetic Basis ◽

Genomic Sequence ◽

Sequence Data ◽

Complete Genomic Sequence ◽

Special Issue ◽

Model Species ◽

Road Map ◽

Generation Sequencing ◽

Complete Genomic

Next-generation sequencing provides a nearly complete genomic sequence for model and non-model species alike; however, this wealth of sequence data includes no road map [...]

Download Full-text

Enhanced Control of the Fungus Gnat Bradysia odoriphaga (Diptera: Sciaridae) by Co-Application of Clothianidin and Hexaflumuron

Insects ◽

10.3390/insects12070571 ◽

2021 ◽

Vol 12 (7) ◽

pp. 571

Author(s):

Yongqing Wang ◽

Kai Wan ◽

Ruifei Wang ◽

Jiyingzi Wu ◽

Ruiquan Hou ◽

...

Keyword(s):

Maximum Residue Limit ◽

Control Efficacy ◽

Codex Alimentarius ◽

Codex Alimentarius Commission ◽

Application Rates ◽

Fungus Gnat ◽

Major Pest ◽

First Time

The fungus gnat is a major pest of chive in China. Its control has been relied heavily on the application of clothianidin. Due to the intensive application, its control efficacy become reduced. The present study was intended to evaluate co-drenching of clothianidin with hexaflumuron on absorption and dissipation of clothianidin in chive plants and soils and determine the effect of such application on control efficacies. Chive production fields in Guangdong and Hubei Provinces were drenched with clothianidin alone and a mixture of clothianidin and hexaflumuron at low application rates. Concentrations of clothianidin in chive plants and soils were analyzed by HPLC. Results showed that co-application had higher control efficacies against the fungus gnat than clothianidin alone. The co-application enhanced clothianidin absorption and dissipation and extended the half-lives of clothianidin in chive. It was likely that hexaflumuron protected chive roots from larva damage, and healthy roots absorbed more clothianidin, resulting in the extension of the half-lives. Additionally, the terminal residues of clothianidin in chive after 14 days of application were lower than the maximum residue limit in chive set by the Codex Alimentarius Commission. This study for the first time documented that co-application of clothianidin and hexaflumuron improved chive plants in absorption and dissipation of clothianidin and enhanced fungus gnat control efficacies.

Download Full-text

DNA sonification for public engagement in bioinformatics

BMC Research Notes ◽

10.1186/s13104-021-05685-7 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Heleen Plaisier ◽

Thomas R. Meagher ◽

Daniel Barker

Keyword(s):

Dna Sequence ◽

Public Engagement ◽

Sequence Data ◽

Sensory Perception ◽

Data Representation ◽

Sequence Information ◽

Dna Sequence Data ◽

Public Events ◽

Dna Base ◽

Alternative Means

Abstract Objective Visualisation methods, primarily color-coded representation of sequence data, have been a predominant means of representation of DNA data. Algorithmic conversion of DNA sequence data to sound—sonification—represents an alternative means of representation that uses a different range of human sensory perception. We propose that sonification has value for public engagement with DNA sequence information because it has potential to be entertaining as well as informative. We conduct preliminary work to explore the potential of DNA sequence sonification in public engagement with bioinformatics. We apply a simple sonification technique for DNA, in which each DNA base is represented by a specific note. Additionally, a beat may be added to indicate codon boundaries or for musical effect. We report a brief analysis from public engagement events we conducted that featured this method of sonification. Results We report on use of DNA sequence sonification at two public events. Sonification has potential in public engagement with bioinformatics, both as a means of data representation and as a means to attract audience to a drop-in stand. We also discuss further directions for research on integration of sonification into bioinformatics public engagement and education.

Download Full-text

Identification and Characterization of Novel Human Endogenous Retrovirus Families by Phylogenetic Screening of the Human Genome Mapping Project Database

Journal of Virology ◽

10.1128/jvi.74.8.3715-3730.2000 ◽

2000 ◽

Vol 74 (8) ◽

pp. 3715-3730 ◽

Cited By ~ 202

Author(s):

Michael Tristem

Keyword(s):

Human Genome ◽

Genome Mapping ◽

Sequence Data ◽

Endogenous Retrovirus ◽

Endogenous Retroviruses ◽

Human Endogenous Retrovirus ◽

Sequence Information ◽

Class Iii ◽

Genome Mapping Project ◽

Human Genome Mapping Project

ABSTRACT Human endogenous retroviruses (HERVs) were first identified almost 20 years ago, and since then numerous families have been described. It has, however, been difficult to obtain a good estimate of both the total number of independently derived families and their relationship to each other as well as to other members of the familyRetroviridae. In this study, I used sequence data derived from over 150 novel HERVs, obtained from the Human Genome Mapping Project database, and a variety of recently identified nonhuman retroviruses to classify the HERVs into 22 independently acquired families. Of these, 17 families were loosely assigned to the class I HERVs, 3 to the class II HERVs and 2 to the class III HERVs. Many of these families have been identified previously, but six are described here for the first time and another four, for which only partial sequence information was previously available, were further characterized. Members of each of the 10 families are defective, and calculation of their integration dates suggested that most of them are likely to have been present within the human lineage since it diverged from the Old World monkeys more than 25 million years ago.

Download Full-text

Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm

BMC Bioinformatics ◽

10.1186/1471-2105-9-235 ◽

2008 ◽

Vol 9 (1) ◽

pp. 235 ◽

Cited By ~ 22

Author(s):

Jeremy D DeBarry ◽

Renyi Liu ◽

Jeffrey L Bennetzen

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Repeat Family

Download Full-text

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.054171-0 ◽

2014 ◽

Vol 64 (Pt_2) ◽

pp. 316-324 ◽

Cited By ~ 258

Author(s):

Jongsik Chun ◽

Fred A. Rainey

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Original Research ◽

Rrna Gene ◽

New Taxon ◽

Genome Sequences ◽

Microbial World ◽

Content Type ◽

Link Type ◽

Type Strains

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

Download Full-text

Comparative Species Divergence across Eight Triplets of Spiny Lizards (Sceloporus) Using Genomic Sequence Data

Genome Biology and Evolution ◽

10.1093/gbe/evt186 ◽

2013 ◽

Vol 5 (12) ◽

pp. 2410-2419 ◽

Cited By ~ 24

Author(s):

Adam D. Leaché ◽

Rebecca B. Harris ◽

Max E. Maliska ◽

Charles W. Linkem

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Species Divergence

Download Full-text