Alliance of Genome Resources Portal: unified model organism research platform

Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp. (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified “look and feel,” the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient “knowledge commons” for model organisms using shared, modular infrastructure.

Download Full-text

Facets and measures of gene ontology annotation quality in model organism databases

Proceedings of the American Society for Information Science and Technology ◽

10.1002/meet.14504301260 ◽

2007 ◽

Vol 43 (1) ◽

pp. 1-7

Author(s):

W. John MacMullen

Keyword(s):

Gene Ontology ◽

Model Organism ◽

Gene Ontology Annotation ◽

Model Organism Databases ◽

Annotation Quality

Download Full-text

Hidden in plain sight: What remains to be discovered in the eukaryotic proteome?

10.1101/469569 ◽

2018 ◽

Author(s):

Valerie Wood ◽

Antonia Lock ◽

Midori A. Harris ◽

Kim Rutherford ◽

Jürg Bähler ◽

...

Keyword(s):

Gene Ontology ◽

Fission Yeast ◽

Genome Sequencing ◽

Large Scale ◽

Biological Process ◽

Blind Spot ◽

Model Organisms ◽

Biological Processes ◽

Health And Disease

AbstractThe first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes. To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences.We use a simple yet powerful metric based on Gene Ontology (GO) biological process terms to define characterized and uncharacterized proteins for human, budding yeast, and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe, and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalog of proteins’ biological roles.

Download Full-text

gNOMO: a multi-omics pipeline for integrated host and microbiome analysis of non-model organisms

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa058 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Maria Muñoz-Benavent ◽

Felix Hartkopf ◽

Tim Van Den Bossche ◽

Vitor C Piro ◽

Carlos García-Ferris ◽

...

Keyword(s):

Workflow Management ◽

Model Organism ◽

Model Organisms ◽

Omics Data ◽

Sequencing Data ◽

Data Types ◽

Expression Ratio ◽

Bioinformatic Pipeline ◽

Cockroach Blattella Germanica ◽

Microbiome Data

Abstract The study of bacterial symbioses has grown exponentially in the recent past. However, existing bioinformatic workflows of microbiome data analysis do commonly not integrate multiple meta-omics levels and are mainly geared toward human microbiomes. Microbiota are better understood when analyzed in their biological context; that is together with their host or environment. Nevertheless, this is a limitation when studying non-model organisms mainly due to the lack of well-annotated sequence references. Here, we present gNOMO, a bioinformatic pipeline that is specifically designed to process and analyze non-model organism samples of up to three meta-omics levels: metagenomics, metatranscriptomics and metaproteomics in an integrative manner. The pipeline has been developed using the workflow management framework Snakemake in order to obtain an automated and reproducible pipeline. Using experimental datasets of the German cockroach Blattella germanica, a non-model organism with very complex gut microbiome, we show the capabilities of gNOMO with regard to meta-omics data integration, expression ratio comparison, taxonomic and functional analysis as well as intuitive output visualization. In conclusion, gNOMO is a bioinformatic pipeline that can easily be configured, for integrating and analyzing multiple meta-omics data types and for producing output visualizations, specifically designed for integrating paired-end sequencing data with mass spectrometry from non-model organisms.

Download Full-text

The Resource Identification Initiative: A cultural shift in publishing

F1000Research ◽

10.12688/f1000research.6555.2 ◽

2015 ◽

Vol 4 ◽

pp. 134 ◽

Cited By ~ 6

Author(s):

Anita Bandrowski ◽

Matthew Brush ◽

Jeffery S. Grethe ◽

Melissa A. Haendel ◽

David N. Kennedy ◽

...

Keyword(s):

Model Organism ◽

Pilot Project ◽

Model Organisms ◽

Dramatic Improvement ◽

Reporting Practices ◽

Resource Identification ◽

Support Of Research ◽

Machine Readable ◽

Model Organism Databases ◽

Research Resources

A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as “What other studies used resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal (www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.

Download Full-text

Hidden in plain sight: what remains to be discovered in the eukaryotic proteome?

Open Biology ◽

10.1098/rsob.180241 ◽

2019 ◽

Vol 9 (2) ◽

pp. 180241 ◽

Cited By ~ 11

Author(s):

Valerie Wood ◽

Antonia Lock ◽

Midori A. Harris ◽

Kim Rutherford ◽

Jürg Bähler ◽

...

Keyword(s):

Gene Ontology ◽

Fission Yeast ◽

Genome Sequencing ◽

Large Scale ◽

Budding Yeast ◽

Blind Spot ◽

Model Organisms ◽

Biological Processes ◽

Health And Disease

The first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes (BP). To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences. We use a simple yet powerful metric based on Gene Ontology BP terms to define characterized and uncharacterized proteins for human, budding yeast and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe , and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalogue of proteins’ biological roles.

Download Full-text

New Data and Collaborations at the Saccharomyces Genome Database: Updated reference genome, alleles, and the Alliance of Genome Resources

10.1101/2021.09.16.460706 ◽

2021 ◽

Author(s):

Stacia R Engel ◽

Edith D Wong ◽

Robert S Nash ◽

Suzi Aleksander ◽

Micheal Alexander ◽

...

Keyword(s):

Reference Genome ◽

Model Organism ◽

Saccharomyces Genome Database ◽

Model Organisms ◽

Data Types ◽

Genome Database ◽

Product Function ◽

Chromosome Maps ◽

Genome Information ◽

Gene Product Function

Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.

Download Full-text

InterMOD: integrated data and tools for the unification of model organism research

Scientific Reports ◽

10.1038/srep01802 ◽

2013 ◽

Vol 3 (1) ◽

Cited By ~ 22

Author(s):

Julie Sullivan ◽

Kalpana Karra ◽

Sierra A. T. Moxon ◽

Andrew Vallejos ◽

Howie Motenko ◽

...

Keyword(s):

Comparative Research ◽

Model Organism ◽

Genomic Analysis ◽

Fruit Fly ◽

Model Organisms ◽

Basic Biology ◽

Set Up ◽

And Function ◽

Model Organism Databases ◽

Nematode Worm

Abstract Model organisms are widely used for understanding basic biology and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.

Download Full-text

New Data and Collaborations at the Saccharomyces Genome Database: Updated reference genome, alleles, and the Alliance of Genome Resources

Genetics ◽

10.1093/genetics/iyab224 ◽

2021 ◽

Author(s):

Stacia R Engel ◽

Edith D Wong ◽

Robert S Nash ◽

Suzi Aleksander ◽

Micheal Alexander ◽

...

Keyword(s):

Reference Genome ◽

Model Organism ◽

Saccharomyces Genome Database ◽

Model Organisms ◽

Data Types ◽

Genome Database ◽

Product Function ◽

Chromosome Maps ◽

Genome Information ◽

Gene Product Function

Abstract Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.

Download Full-text

The Resource Identification Initiative: A cultural shift in publishing

F1000Research ◽

10.12688/f1000research.6555.1 ◽

2015 ◽

Vol 4 ◽

pp. 134 ◽

Cited By ~ 24

Author(s):

Anita Bandrowski ◽

Matthew Brush ◽

Jeffery S. Grethe ◽

Melissa A. Haendel ◽

David N. Kennedy ◽

...

Keyword(s):

Model Organism ◽

Pilot Project ◽

Model Organisms ◽

Dramatic Improvement ◽

Reporting Practices ◽

Resource Identification ◽

Support Of Research ◽

Machine Readable ◽

Model Organism Databases ◽

Research Resources

A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as “What other studies used resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal (www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.

Download Full-text