GENE ONTOLOGY SIMILARITY MEASURES BASED ON LINEAR ORDER STATISTICS

Author(s):  
JAMES M. KELLER ◽  
JAMES C. BEZDEK ◽  
MIHAIL POPESCU ◽  
NIKHIL R. PAL ◽  
JOYCE A. MITCHELL ◽  
...  

The standard method for comparing gene products (proteins or RNA) is to compare their DNA or amino acid sequences. Additional information about some gene products may come from multiple sources, including the set of Gene Ontology (GO) annotations and the set of journal abstracts related to each gene product. Gene product similarity measures can be based on evaluating sets of descriptor terms found in the GO taxonomy, and/or the index term sets of the related documents (MeSH annotations). While our techniques can be applied to term sets from any taxonomy, we restrict our examples in this article to GO annotations. We investigate the use of linear order statistics (LOS) to build similarity relations on pairs of terms that are used in the GO as linguistic descriptors of genes and gene products. One of our objectives is to investigate the construction and utility of visual assessments of relational data (in this case, dissimilarity matrices) for discovering tendencies of groups of gene products to "cluster together". We use gene product data derived from a group of 194 gene products representing three protein families extracted from ENSEMBL. Our examples suggest that LOS similarity measures are more effective than traditional sequence-based similarity measures at capturing relationships between pairs of gene products in ENSEMBL families when annotation information is available. We show examples of how these similarity measures can assist in knowledge discovery and gene product family validation.

2012 ◽  
Vol 12 (1) ◽  
pp. 101-108 ◽  
Author(s):  
Diane O. Inglis ◽  
Marek S. Skrzypek ◽  
Martha B. Arnaud ◽  
Jonathan Binkley ◽  
Prachi Shah ◽  
...  

ABSTRACTThe opportunistic fungal pathogenCandida albicansis a significant medical threat, especially for immunocompromised patients. Experimental research has focused on specific areas ofC. albicansbiology, with the goal of understanding the multiple factors that contribute to its pathogenic potential. Some of these factors include cell adhesion, invasive or filamentous growth, and the formation of drug-resistant biofilms. The Gene Ontology (GO) (www.geneontology.org) is a standardized vocabulary that theCandidaGenome Database (CGD) (www.candidagenome.org) and other groups use to describe the functions of gene products. To improve the breadth and accuracy of pathogenicity-related gene product descriptions and to facilitate the description of as yet uncharacterized but potentially pathogenicity-related genes inCandidaspecies, CGD undertook a three-part project: first, the addition of terms to the biological process branch of the GO to improve the description of fungus-related processes; second, manual recuration of gene product annotations in CGD to use the improved GO vocabulary; and third, computational ortholog-based transfer of GO annotations from experimentally characterized gene products, using these new terms, to uncharacterized orthologs in otherCandidaspecies. Through genome annotation and analysis, we identified candidate pathogenicity genes in seven non-C. albicans Candidaspecies and in one additionalC. albicansstrain, WO-1. We also defined a set ofC. albicansgenes at the intersection of biofilm formation, filamentous growth, pathogenesis, and phenotypic switching of this opportunistic fungal pathogen, which provides a compelling list of candidates for further experimentation.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Mingxin Gan

Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson’s correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.


2020 ◽  
Vol 49 (D1) ◽  
pp. D325-D334
Author(s):  
◽  
Seth Carbon ◽  
Eric Douglass ◽  
Benjamin M Good ◽  
Deepak R Unni ◽  
...  

Abstract The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.


1996 ◽  
Vol 16 (6) ◽  
pp. 2719-2727 ◽  
Author(s):  
S Silve ◽  
P Leplatois ◽  
A Josse ◽  
P H Dupuy ◽  
C Lanau ◽  
...  

SR 31747 is a novel immunosuppressant agent that arrests cell proliferation in the yeast Saccharomyces cerevisiae, SR 31747-treated cells accumulate the same aberrant sterols as those found in a mutant impaired in delta 8- delta 7-sterol isomerase. Sterol isomerase activity is also inhibited by SR 31747 in in vitro assays. Overexpression of the sterol isomerase-encoding gene, ERG2, confers enhanced SR resistance. Cells growing anaerobically on ergosterol-containing medium are not sensitive to SR. Disruption of the sterol isomerase-encoding gene is lethal in cells growing in the absence of exogenous ergosterol, except in SR-resistant mutants lacking either the SUR4 or the FEN1 gene product. The results suggest that sterol isomerase is the target of SR 31747 and that both the SUR4 and FEN1 gene products are required to mediate the proliferation arrest induced by ergosterol depletion.


1986 ◽  
Vol 6 (4) ◽  
pp. 1304-1314
Author(s):  
M Hannink ◽  
M K Sauer ◽  
D J Donoghue

The v-sis gene encodes chain B of platelet-derived growth factor. However, this gene codes for additional amino acids at both the N terminus and the C terminus of its gene product which are not present in the amino acid sequence of platelet-derived growth factor. We constructed a series of deletion mutants with deletions in the v-sis gene in order to define the C-terminal limit of the v-sis gene product which is required for transformation. Deletion mutants of the v-sis gene which encoded truncated gene products up to 57 residues shorter than the v-siswt gene product were still able to transform cells. The minimal transforming region of the v-sis gene product contained six residues fewer than were present in chain B of platelet-derived growth factor. Only 10 residues, including the sequence Cys-Lys-Cys, separated the smallest transforming gene product from the largest nontransforming gene product. These cysteine residues were also important for dimerization of the v-sis gene product, since all of the nontransforming v-sis deletions were unable to form dimers when they were analyzed under nonreducing conditions. Our results suggest that there is a strong connection between transformation and dimerization.


2013 ◽  
Vol 2013 ◽  
pp. 1-5 ◽  
Author(s):  
Suresh Subramani ◽  
Saranya Jayapalan ◽  
Raja Kalpana ◽  
Jeyakumar Natarajan

HomoKinase database is a comprehensive collection of curated human protein kinases and their relevant biological information. The entries in the database are curated by three criteria: HGNC approval, gene ontology-based biological process (protein phosphorylation), and molecular function (ATP binding and kinase activity). For a given query protein kinase name, the database provides its official symbol, full name, other known aliases, amino acid sequences, functional domain, gene ontology, pathways assignments, and drug compounds. In addition, as a search tool, it enables the retrieval of similar protein kinases with specific family, subfamily, group, and domain combinations and tabulates the information. The present version contains 498 curated human protein kinases and links to other popular databases.


1992 ◽  
Vol 283 (2) ◽  
pp. 321-326 ◽  
Author(s):  
A R Slabas ◽  
D Chase ◽  
I Nishida ◽  
N Murata ◽  
C Sidebottom ◽  
...  

cDNA clones encoding the fatty-acid- biosynthetic enzyme NADPH-linked 3-oxoacyl-(acyl carrier protein) (ACP) reductase were isolated from a Brassica napus (rape) developing seed library and from an Arabidopsis thaliana (thale cress) leaf library. The N-terminal end of the coding region shows features typical of a stromal-targeting plastid-transit peptide. The deduced amino acid sequences have 41% and 55% identity respectively with the nodG-gene product of Rhizobium meliloti, one of the host-specific genes that restrict infectivity of this bacterium to a small range of host plants. The probability that the nodG-gene product is a oxoreductase strengthens the hypothesis that some of the host-specific nod-gene products are enzymes which synthesize polyketides that uniquely modify the Rhizobium nodulation signal molecule.


Sign in / Sign up

Export Citation Format

Share Document