GENE ONTOLOGY SIMILARITY MEASURES BASED ON LINEAR ORDER STATISTICS

The standard method for comparing gene products (proteins or RNA) is to compare their DNA or amino acid sequences. Additional information about some gene products may come from multiple sources, including the set of Gene Ontology (GO) annotations and the set of journal abstracts related to each gene product. Gene product similarity measures can be based on evaluating sets of descriptor terms found in the GO taxonomy, and/or the index term sets of the related documents (MeSH annotations). While our techniques can be applied to term sets from any taxonomy, we restrict our examples in this article to GO annotations. We investigate the use of linear order statistics (LOS) to build similarity relations on pairs of terms that are used in the GO as linguistic descriptors of genes and gene products. One of our objectives is to investigate the construction and utility of visual assessments of relational data (in this case, dissimilarity matrices) for discovering tendencies of groups of gene products to "cluster together". We use gene product data derived from a group of 194 gene products representing three protein families extracted from ENSEMBL. Our examples suggest that LOS similarity measures are more effective than traditional sequence-based similarity measures at capturing relationships between pairs of gene products in ENSEMBL families when annotation information is available. We show examples of how these similarity measures can assist in knowledge discovery and gene product family validation.

Download Full-text

Ontological self-organizing maps for cluster visualization and functional summarization of gene products using Gene Ontology similarity measures

2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence) ◽

10.1109/fuzzy.2008.4630351 ◽

2008 ◽

Cited By ~ 1

Author(s):

Timothy C. Havens ◽

James M. Keller ◽

Mihail Popescu ◽

James C. Bezdek

Keyword(s):

Gene Ontology ◽

Similarity Measures ◽

Gene Products ◽

Self Organizing Maps ◽

Cluster Visualization ◽

Self Organizing

Download Full-text

Improved Gene Ontology Annotation for Biofilm Formation, Filamentous Growth, and Phenotypic Switching in Candida albicans

Eukaryotic Cell ◽

10.1128/ec.00238-12 ◽

2012 ◽

Vol 12 (1) ◽

pp. 101-108 ◽

Cited By ~ 13

Author(s):

Diane O. Inglis ◽

Marek S. Skrzypek ◽

Martha B. Arnaud ◽

Jonathan Binkley ◽

Prachi Shah ◽

...

Keyword(s):

Gene Ontology ◽

Candida Albicans ◽

Gene Product ◽

Fungal Pathogen ◽

Biofilm Formation ◽

Filamentous Growth ◽

Phenotypic Switching ◽

Gene Products ◽

Content Type ◽

Opportunistic Fungal Pathogen

ABSTRACTThe opportunistic fungal pathogenCandida albicansis a significant medical threat, especially for immunocompromised patients. Experimental research has focused on specific areas ofC. albicansbiology, with the goal of understanding the multiple factors that contribute to its pathogenic potential. Some of these factors include cell adhesion, invasive or filamentous growth, and the formation of drug-resistant biofilms. The Gene Ontology (GO) (www.geneontology.org) is a standardized vocabulary that theCandidaGenome Database (CGD) (www.candidagenome.org) and other groups use to describe the functions of gene products. To improve the breadth and accuracy of pathogenicity-related gene product descriptions and to facilitate the description of as yet uncharacterized but potentially pathogenicity-related genes inCandidaspecies, CGD undertook a three-part project: first, the addition of terms to the biological process branch of the GO to improve the description of fungus-related processes; second, manual recuration of gene product annotations in CGD to use the improved GO vocabulary; and third, computational ortholog-based transfer of GO annotations from experimentally characterized gene products, using these new terms, to uncharacterized orthologs in otherCandidaspecies. Through genome annotation and analysis, we identified candidate pathogenicity genes in seven non-C. albicans Candidaspecies and in one additionalC. albicansstrain, WO-1. We also defined a set ofC. albicansgenes at the intersection of biofilm formation, filamentous growth, pathogenesis, and phenotypic switching of this opportunistic fungal pathogen, which provides a compelling list of candidates for further experimentation.

Download Full-text

Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

Computational and Mathematical Methods in Medicine ◽

10.1155/2014/891842 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Mingxin Gan

Keyword(s):

Gene Ontology ◽

Gene Product ◽

Correlation Coefficient ◽

Semantic Similarity ◽

Biological Process ◽

Jaccard Index ◽

Biological Knowledge ◽

Gene Products ◽

Functional Relationships ◽

Information Contents

Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

Download Full-text

Functional Summarization of Gene Product Clusters Using Gene Ontology Similarity Measures

Proceedings of the 2004 Intelligent Sensors, Sensor Networks and Information Processing Conference, 2004. ◽

10.1109/issnip.2004.1417521 ◽

2005 ◽

Author(s):

Mihail Popescu ◽

James M. Keller ◽

Joyce A. Mitchell ◽

James C. Bezdek

Keyword(s):

Gene Ontology ◽

Gene Product ◽

Similarity Measures

Download Full-text

A New Family of Similarity Measures for Scoring Confidence of Protein Interactions using Gene Ontology

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3083150 ◽

2021 ◽

pp. 1-1

Author(s):

Madhusudan Paul ◽

Ashish Anand

Keyword(s):

Gene Ontology ◽

Protein Interactions ◽

Similarity Measures ◽

New Family

Download Full-text

The Gene Ontology resource: enriching a GOld mine

Nucleic Acids Research ◽

10.1093/nar/gkaa1113 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D325-D334

Author(s):

◽

Seth Carbon ◽

Eric Douglass ◽

Benjamin M Good ◽

Deepak R Unni ◽

...

Keyword(s):

Experimental Data ◽

Gene Ontology ◽

Gold Mine ◽

Gene Products ◽

Gene Ontology Consortium ◽

The Past ◽

Historical Archive ◽

File Structure

Abstract The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.

Download Full-text

The immunosuppressant SR 31747 blocks cell proliferation by inhibiting a steroid isomerase in Saccharomyces cerevisiae.

Molecular and Cellular Biology ◽

10.1128/mcb.16.6.2719 ◽

1996 ◽

Vol 16 (6) ◽

pp. 2719-2727 ◽

Cited By ~ 58

Author(s):

S Silve ◽

P Leplatois ◽

A Josse ◽

P H Dupuy ◽

C Lanau ◽

...

Keyword(s):

Saccharomyces Cerevisiae ◽

Cell Proliferation ◽

Gene Product ◽

In Vitro Assays ◽

Gene Products ◽

Yeast Saccharomyces Cerevisiae ◽

Isomerase Activity ◽

Encoding Gene ◽

Resistant Mutants

SR 31747 is a novel immunosuppressant agent that arrests cell proliferation in the yeast Saccharomyces cerevisiae, SR 31747-treated cells accumulate the same aberrant sterols as those found in a mutant impaired in delta 8- delta 7-sterol isomerase. Sterol isomerase activity is also inhibited by SR 31747 in in vitro assays. Overexpression of the sterol isomerase-encoding gene, ERG2, confers enhanced SR resistance. Cells growing anaerobically on ergosterol-containing medium are not sensitive to SR. Disruption of the sterol isomerase-encoding gene is lethal in cells growing in the absence of exogenous ergosterol, except in SR-resistant mutants lacking either the SUR4 or the FEN1 gene product. The results suggest that sterol isomerase is the target of SR 31747 and that both the SUR4 and FEN1 gene products are required to mediate the proliferation arrest induced by ergosterol depletion.

Download Full-text

Deletions in the C-terminal coding region of the v-sis gene: dimerization is required for transformation

Molecular and Cellular Biology ◽

10.1128/mcb.6.4.1304-1314.1986 ◽

1986 ◽

Vol 6 (4) ◽

pp. 1304-1314

Author(s):

M Hannink ◽

M K Sauer ◽

D J Donoghue

Keyword(s):

Growth Factor ◽

Gene Product ◽

Platelet Derived Growth Factor ◽

Deletion Mutants ◽

Gene Products ◽

Coding Region ◽

Strong Connection ◽

C Terminus ◽

Its Gene ◽

Transforming Gene

The v-sis gene encodes chain B of platelet-derived growth factor. However, this gene codes for additional amino acids at both the N terminus and the C terminus of its gene product which are not present in the amino acid sequence of platelet-derived growth factor. We constructed a series of deletion mutants with deletions in the v-sis gene in order to define the C-terminal limit of the v-sis gene product which is required for transformation. Deletion mutants of the v-sis gene which encoded truncated gene products up to 57 residues shorter than the v-siswt gene product were still able to transform cells. The minimal transforming region of the v-sis gene product contained six residues fewer than were present in chain B of platelet-derived growth factor. Only 10 residues, including the sequence Cys-Lys-Cys, separated the smallest transforming gene product from the largest nontransforming gene product. These cysteine residues were also important for dimerization of the v-sis gene product, since all of the nontransforming v-sis deletions were unable to form dimers when they were analyzed under nonreducing conditions. Our results suggest that there is a strong connection between transformation and dimerization.

Download Full-text

HomoKinase: A Curated Database of Human Protein Kinases

ISRN Computational Biology ◽

10.1155/2013/417634 ◽

2013 ◽

Vol 2013 ◽

pp. 1-5 ◽

Cited By ~ 11

Author(s):

Suresh Subramani ◽

Saranya Jayapalan ◽

Raja Kalpana ◽

Jeyakumar Natarajan

Keyword(s):

Gene Ontology ◽

Protein Kinases ◽

Kinase Activity ◽

Query Protein ◽

Amino Acid Sequences ◽

Human Protein ◽

Biological Information ◽

Functional Domain ◽

Comprehensive Collection ◽

Search Tool

HomoKinase database is a comprehensive collection of curated human protein kinases and their relevant biological information. The entries in the database are curated by three criteria: HGNC approval, gene ontology-based biological process (protein phosphorylation), and molecular function (ATP binding and kinase activity). For a given query protein kinase name, the database provides its official symbol, full name, other known aliases, amino acid sequences, functional domain, gene ontology, pathways assignments, and drug compounds. In addition, as a search tool, it enables the retrieval of similar protein kinases with specific family, subfamily, group, and domain combinations and tabulates the information. The present version contains 498 curated human protein kinases and links to other popular databases.

Download Full-text

Molecular cloning of higher-plant 3-oxoacyl-(acyl carrier protein) reductase. Sequence identities with the nodG-gene product of the nitrogen-fixing soil bacterium Rhizobium meliloti

Biochemical Journal ◽

10.1042/bj2830321 ◽

1992 ◽

Vol 283 (2) ◽

pp. 321-326 ◽

Cited By ~ 30

Author(s):

A R Slabas ◽

D Chase ◽

I Nishida ◽

N Murata ◽

C Sidebottom ◽

...

Keyword(s):

Gene Product ◽

Rhizobium Meliloti ◽

Acyl Carrier Protein ◽

Amino Acid Sequences ◽

Cdna Clones ◽

Small Range ◽

Carrier Protein ◽

Coding Region ◽

Higher Plant ◽

Signal Molecule

cDNA clones encoding the fatty-acid- biosynthetic enzyme NADPH-linked 3-oxoacyl-(acyl carrier protein) (ACP) reductase were isolated from a Brassica napus (rape) developing seed library and from an Arabidopsis thaliana (thale cress) leaf library. The N-terminal end of the coding region shows features typical of a stromal-targeting plastid-transit peptide. The deduced amino acid sequences have 41% and 55% identity respectively with the nodG-gene product of Rhizobium meliloti, one of the host-specific genes that restrict infectivity of this bacterium to a small range of host plants. The probability that the nodG-gene product is a oxoreductase strengthens the hypothesis that some of the host-specific nod-gene products are enzymes which synthesize polyketides that uniquely modify the Rhizobium nodulation signal molecule.

Download Full-text