New Data and Collaborations at the Saccharomyces Genome Database: Updated reference genome, alleles, and the Alliance of Genome Resources

Abstract Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.

Download Full-text

New Data and Collaborations at the Saccharomyces Genome Database: Updated reference genome, alleles, and the Alliance of Genome Resources

10.1101/2021.09.16.460706 ◽

2021 ◽

Author(s):

Stacia R Engel ◽

Edith D Wong ◽

Robert S Nash ◽

Suzi Aleksander ◽

Micheal Alexander ◽

...

Keyword(s):

Reference Genome ◽

Model Organism ◽

Saccharomyces Genome Database ◽

Model Organisms ◽

Data Types ◽

Genome Database ◽

Product Function ◽

Chromosome Maps ◽

Genome Information ◽

Gene Product Function

Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.

Download Full-text

In Search of Species-Specific SNPs in a Non-Model Animal (European Bison (Bison bonasus))—Comparison of De Novo and Reference-Based Integrated Pipeline of STACKS Using Genotyping-by-Sequencing (GBS) Data

Animals ◽

10.3390/ani11082226 ◽

2021 ◽

Vol 11 (8) ◽

pp. 2226

Author(s):

Sazia Kunvar ◽

Sylwia Czarnomska ◽

Cino Pertoldi ◽

Małgorzata Tokarska

Keyword(s):

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Model Organism ◽

Genotyping By Sequencing ◽

Model Organisms ◽

European Bison ◽

Model Animal ◽

Pcr Duplicates ◽

Species Specific

The European bison is a non-model organism; thus, most of its genetic and genomic analyses have been performed using cattle-specific resources, such as BovineSNP50 BeadChip or Illumina Bovine 800 K HD Bead Chip. The problem with non-specific tools is the potential loss of evolutionary diversified information (ascertainment bias) and species-specific markers. Here, we have used a genotyping-by-sequencing (GBS) approach for genotyping 256 samples from the European bison population in Bialowieza Forest (Poland) and performed an analysis using two integrated pipelines of the STACKS software: one is de novo (without reference genome) and the other is a reference pipeline (with reference genome). Moreover, we used a reference pipeline with two different genomes, i.e., Bos taurus and European bison. Genotyping by sequencing (GBS) is a useful tool for SNP genotyping in non-model organisms due to its cost effectiveness. Our results support GBS with a reference pipeline without PCR duplicates as a powerful approach for studying the population structure and genotyping data of non-model organisms. We found more polymorphic markers in the reference pipeline in comparison to the de novo pipeline. The decreased number of SNPs from the de novo pipeline could be due to the extremely low level of heterozygosity in European bison. It has been confirmed that all the de novo/Bos taurus and Bos taurus reference pipeline obtained SNPs were unique and not included in 800 K BovineHD BeadChip.

Download Full-text

Alliance of Genome Resources Portal: unified model organism research platform

Nucleic Acids Research ◽

10.1093/nar/gkz813 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D650-D658 ◽

Cited By ~ 36

Author(s):

◽

Julie Agapite ◽

Laurent-Philippe Albou ◽

Suzi Aleksander ◽

Joanna Argasinska ◽

...

Keyword(s):

Gene Ontology ◽

Model Organism ◽

Model Organisms ◽

Data Types ◽

Primary Model ◽

Genomic Studies ◽

Health And Disease ◽

Extensive Body ◽

Access To Data ◽

Model Organism Databases

Abstract The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g. genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.

Download Full-text

Fungal BLAST and Model Organism BLASTP Best Hits: new comparison resources at the Saccharomyces Genome Database (SGD)

Nucleic Acids Research ◽

10.1093/nar/gki023 ◽

2004 ◽

Vol 33 (Database issue) ◽

pp. D374-D377 ◽

Cited By ~ 25

Author(s):

R. Balakrishnan

Keyword(s):

Model Organism ◽

Saccharomyces Genome Database ◽

Genome Database

Download Full-text

gNOMO: a multi-omics pipeline for integrated host and microbiome analysis of non-model organisms

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa058 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Maria Muñoz-Benavent ◽

Felix Hartkopf ◽

Tim Van Den Bossche ◽

Vitor C Piro ◽

Carlos García-Ferris ◽

...

Keyword(s):

Workflow Management ◽

Model Organism ◽

Model Organisms ◽

Omics Data ◽

Sequencing Data ◽

Data Types ◽

Expression Ratio ◽

Bioinformatic Pipeline ◽

Cockroach Blattella Germanica ◽

Microbiome Data

Abstract The study of bacterial symbioses has grown exponentially in the recent past. However, existing bioinformatic workflows of microbiome data analysis do commonly not integrate multiple meta-omics levels and are mainly geared toward human microbiomes. Microbiota are better understood when analyzed in their biological context; that is together with their host or environment. Nevertheless, this is a limitation when studying non-model organisms mainly due to the lack of well-annotated sequence references. Here, we present gNOMO, a bioinformatic pipeline that is specifically designed to process and analyze non-model organism samples of up to three meta-omics levels: metagenomics, metatranscriptomics and metaproteomics in an integrative manner. The pipeline has been developed using the workflow management framework Snakemake in order to obtain an automated and reproducible pipeline. Using experimental datasets of the German cockroach Blattella germanica, a non-model organism with very complex gut microbiome, we show the capabilities of gNOMO with regard to meta-omics data integration, expression ratio comparison, taxonomic and functional analysis as well as intuitive output visualization. In conclusion, gNOMO is a bioinformatic pipeline that can easily be configured, for integrating and analyzing multiple meta-omics data types and for producing output visualizations, specifically designed for integrating paired-end sequencing data with mass spectrometry from non-model organisms.

Download Full-text

The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases

Genetics ◽

10.1534/genetics.119.302523 ◽

2019 ◽

Vol 213 (4) ◽

pp. 1189-1196 ◽

Cited By ~ 6

Author(s):

Keyword(s):

Laboratory Mouse ◽

Model Organism ◽

Biological Information ◽

Model Organisms ◽

Data Types ◽

Access Methods ◽

Data Archives ◽

Public Data ◽

Data Ecosystem ◽

Model Organism Databases

Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp. (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified “look and feel,” the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient “knowledge commons” for model organisms using shared, modular infrastructure.

Download Full-text

The rat genome database (RGD) facilitates genomic and phenotypic data integration across multiple species for biomedical research

Mammalian Genome ◽

10.1007/s00335-021-09932-x ◽

2021 ◽

Author(s):

M. L. Kaldunski ◽

J. R. Smith ◽

G. T. Hayman ◽

K. Brodie ◽

J. L. De Pons ◽

...

Keyword(s):

Sequence Similarity ◽

Single Gene ◽

Model Organism ◽

Human Diseases ◽

Model Organisms ◽

Phenotypic Data ◽

Genome Database ◽

Bioinformatic Tools ◽

Multiple Species ◽

Rat Genome

AbstractModel organism research is essential for discovering the mechanisms of human diseases by defining biologically meaningful gene to disease relationships. The Rat Genome Database (RGD, (https://rgd.mcw.edu)) is a cross-species knowledgebase and the premier online resource for rat genetic and physiologic data. This rich resource is enhanced by the inclusion and integration of comparative data for human and mouse, as well as other human disease models including chinchilla, dog, bonobo, pig, 13-lined ground squirrel, green monkey, and naked mole-rat. Functional information has been added to records via the assignment of annotations based on sequence similarity to human, rat, and mouse genes. RGD has also imported well-supported cross-species data from external resources. To enable use of these data, RGD has developed a robust infrastructure of standardized ontologies, data formats, and disease- and species-centric portals, complemented with a suite of innovative tools for discovery and analysis. Using examples of single-gene and polygenic human diseases, we illustrate how data from multiple species can help to identify or confirm a gene as involved in a disease and to identify model organisms that can be studied to understand the pathophysiology of a gene or pathway. The ultimate aim of this report is to demonstrate the utility of RGD not only as the core resource for the rat research community but also as a source of bioinformatic tools to support a wider audience, empowering the search for appropriate models for human afflictions.

Download Full-text

Test for temporal or spatial restrictions in gene product function during the cell division cycle

Molecular and Cellular Biology ◽

10.1128/mcb.3.7.1255-1265.1983 ◽

1983 ◽

Vol 3 (7) ◽

pp. 1255-1265

Author(s):

S K Dutcher ◽

L H Hartwell

Keyword(s):

Cell Division ◽

Gene Product ◽

Gene Mutations ◽

Cell Division Cycle ◽

Temperature Sensitive ◽

Restrictive Temperature ◽

Product Function ◽

Complete Cell ◽

Relationship Of ◽

Gene Product Function

The ability of a functional gene to complement a nonfunctional gene may depend upon the intracellular relationship of the two genes. If so, the function of the gene product in question must be limited in time or in space. CDC (cell division cycle) gene products of Saccharomyces cerevisiae control discrete steps in cell division; therefore, they constitute reasonable candidates for genes that function with temporal or spatial restrictions. In an attempt to reveal such restrictions, we compared the ability of a CDC gene to complement a temperature-sensitive cdc gene in diploids where the genes are located within the same nucleus to complementation in heterokaryons where the genes are located in different nuclei. In CDC X cdc matings, complementation was monitored in rare heterokaryons by assaying the production of cdc haploid progeny (cytoductants) at the restrictive temperature. The production of cdc cytoductants indicates that the cdc nucleus was able to complete cell division at the restrictive temperature and implies that the CDC gene product was provided by the other nucleus or by cytoplasm in the heterokaryon. Cytoductants from cdc28 or cdc37 crosses were not efficiently produced, suggesting that these two genes are restricted spatially or temporally in their function. We found that of the cdc mutants tested 33 were complemented; cdc cytoductants were recovered at least as frequently as CDC cytoductants. A particularly interesting example was provided by the CDC4 gene. Mutations in CDC4 were found previously to produce a defect in both cell division and karyogamy. Surprisingly, the cell division defect of cdc4 nuclei is complemented by CDC4 nuclei in a heterokaryon, whereas the karyogamy defect is not.

Download Full-text

Transgenic Epigenetics: Using Transgenic Organisms to Examine Epigenetic Phenomena

Genetics Research International ◽

10.1155/2012/689819 ◽

2012 ◽

Vol 2012 ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

Lori A. McEachern

Keyword(s):

Molecular Mechanisms ◽

Model Organism ◽

Evolutionary Conservation ◽

Model Organisms ◽

Valuable Insight ◽

Epigenetic Control ◽

Transgenic Organisms ◽

Epigenetic Analysis ◽

Insight Into ◽

Epigenetic Processes

Non-model organisms are generally more difficult and/or time consuming to work with than model organisms. In addition, epigenetic analysis of model organisms is facilitated by well-established protocols, and commercially-available reagents and kits that may not be available for, or previously tested on, non-model organisms. Given the evolutionary conservation and widespread nature of many epigenetic mechanisms, a powerful method to analyze epigenetic phenomena from non-model organisms would be to use transgenic model organisms containing an epigenetic region of interest from the non-model. Interestingly, while transgenic Drosophila and mice have provided significant insight into the molecular mechanisms and evolutionary conservation of the epigenetic processes that target epigenetic control regions in other model organisms, this method has so far been under-exploited for non-model organism epigenetic analysis. This paper details several experiments that have examined the epigenetic processes of genomic imprinting and paramutation, by transferring an epigenetic control region from one model organism to another. These cross-species experiments demonstrate that valuable insight into both the molecular mechanisms and evolutionary conservation of epigenetic processes may be obtained via transgenic experiments, which can then be used to guide further investigations and experiments in the species of interest.

Download Full-text

AStrap: identification of alternative splicing from transcript sequences without a reference genome

Bioinformatics ◽

10.1093/bioinformatics/bty1008 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2654-2656 ◽

Cited By ~ 5

Author(s):

Guoli Ji ◽

Wenbin Ye ◽

Yaru Su ◽

Moliang Chen ◽

Guangzao Huang ◽

...

Keyword(s):

Machine Learning ◽

Alternative Splicing ◽

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Supplementary Information ◽

Model Organisms ◽

Sequencing Data ◽

Extensive Evaluation ◽

Reference Genomes

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text