scholarly journals Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns

Open Biology ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 200149 ◽  
Author(s):  
Valerie Wood ◽  
Seth Carbon ◽  
Midori A. Harris ◽  
Antonia Lock ◽  
Stacia R. Engel ◽  
...  

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.

2020 ◽  
Author(s):  
Valerie Wood ◽  
Seth Carbon ◽  
Midori A. Harris ◽  
Antonia Lock ◽  
Stacia R. Engel ◽  
...  

AbstractBiological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally, and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes likely reflects errors in literature curation, ontology structure, or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g., amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 2.5 million automatically propagated annotations across all taxa.


2010 ◽  
Vol 74 (4) ◽  
pp. 479-503 ◽  
Author(s):  
Trudy Torto-Alalibo ◽  
Candace W. Collmer ◽  
Michelle Gwinn-Giglio ◽  
Magdalen Lindeberg ◽  
Shaowu Meng ◽  
...  

SUMMARY Microbes form intimate relationships with hosts (symbioses) that range from mutualism to parasitism. Common microbial mechanisms involved in a successful host association include adhesion, entry of the microbe or its effector proteins into the host cell, mitigation of host defenses, and nutrient acquisition. Genes associated with these microbial mechanisms are known for a broad range of symbioses, revealing both divergent and convergent strategies. Effective comparisons among these symbioses, however, are hampered by inconsistent descriptive terms in the literature for functionally similar genes. Bioinformatic approaches that use homology-based tools are limited to identifying functionally similar genes based on similarities in their sequences. An effective solution to these limitations is provided by the Gene Ontology (GO), which provides a standardized language to describe gene products from all organisms. The GO comprises three ontologies that enable one to describe the molecular function(s) of gene products, the biological processes to which they contribute, and their cellular locations. Beginning in 2004, the Plant-Associated Microbe Gene Ontology (PAMGO) interest group collaborated with the GO consortium to extend the GO to accommodate terms for describing gene products associated with microbe-host interactions. Currently, over 900 terms that describe biological processes common to diverse plant- and animal-associated microbes are incorporated into the GO database. Here we review some unifying themes common to diverse host-microbe associations and illustrate how the new GO terms facilitate a standardized description of the gene products involved. We also highlight areas where new terms need to be developed, an ongoing process that should involve the whole community.


2021 ◽  
Vol 12 (2) ◽  
Author(s):  
Xiaoli Liu ◽  
Zuwei Yin ◽  
Linping Xu ◽  
Huaimin Liu ◽  
Lifeng Jiang ◽  
...  

AbstractLong noncoding RNAs (lncRNAs) play crucial roles in regulating a variety of biological processes in lung adenocarcinoma (LUAD). In our study, we mainly explored the functional roles of a novel lncRNA long intergenic non-protein coding RNA 1426 (LINC01426) in LUAD. We applied bioinformatics analysis to find the expression of LINC01426 was upregulated in LUAD tissue. Functionally, silencing of LINC01426 obviously suppressed the proliferation, migration, epithelial–mesenchymal transition (EMT), and stemness of LUAD cells. Then, we observed that LINC01426 functioned through the hedgehog pathway in LUAD. The effect of LINC01426 knockdown could be fully reversed by adding hedgehog pathway activator SAG. In addition, we proved that LINC01426 could not affect SHH transcription and its mRNA level. Pull-down sliver staining and RIP assay revealed that LINC01426 could interact with USP22. Ubiquitination assays manifested that LINC01426 and USP22 modulated SHH ubiquitination levels. Rescue assays verified that SHH overexpression rescued the cell growth, migration, and stemness suppressed by LINC01426 silencing. In conclusion, LINC01426 promotes LUAD progression by recruiting USP22 to stabilize SHH protein and thus activate the hedgehog pathway.


2020 ◽  
Vol 49 (D1) ◽  
pp. D325-D334
Author(s):  
◽  
Seth Carbon ◽  
Eric Douglass ◽  
Benjamin M Good ◽  
Deepak R Unni ◽  
...  

Abstract The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.


Cells ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 692
Author(s):  
Sweta Talyan ◽  
Samantha Filipów ◽  
Michael Ignarski ◽  
Magdalena Smieszek ◽  
He Chen ◽  
...  

Diseases of the renal filtration unit—the glomerulus—are the most common cause of chronic kidney disease. Podocytes are the pivotal cell type for the function of this filter and focal-segmental glomerulosclerosis (FSGS) is a classic example of a podocytopathy leading to proteinuria and glomerular scarring. Currently, no targeted treatment of FSGS is available. This lack of therapeutic strategies is explained by a limited understanding of the defects in podocyte cell biology leading to FSGS. To date, most studies in the field have focused on protein-coding genes and their gene products. However, more than 80% of all transcripts produced by mammalian cells are actually non-coding. Here, long non-coding RNAs (lncRNAs) are a relatively novel class of transcripts and have not been systematically studied in FSGS to date. The appropriate tools to facilitate lncRNA research for the renal scientific community are urgently required due to a row of challenges compared to classical analysis pipelines optimized for coding RNA expression analysis. Here, we present the bioinformatic pipeline CALINCA as a solution for this problem. CALINCA automatically analyzes datasets from murine FSGS models and quantifies both annotated and de novo assembled lncRNAs. In addition, the tool provides in-depth information on podocyte specificity of these lncRNAs, as well as evolutionary conservation and expression in human datasets making this pipeline a crucial basis to lncRNA studies in FSGS.


Sign in / Sign up

Export Citation Format

Share Document