scholarly journals The rat genome database (RGD) facilitates genomic and phenotypic data integration across multiple species for biomedical research

2021 ◽  
Author(s):  
M. L. Kaldunski ◽  
J. R. Smith ◽  
G. T. Hayman ◽  
K. Brodie ◽  
J. L. De Pons ◽  
...  

AbstractModel organism research is essential for discovering the mechanisms of human diseases by defining biologically meaningful gene to disease relationships. The Rat Genome Database (RGD, (https://rgd.mcw.edu)) is a cross-species knowledgebase and the premier online resource for rat genetic and physiologic data. This rich resource is enhanced by the inclusion and integration of comparative data for human and mouse, as well as other human disease models including chinchilla, dog, bonobo, pig, 13-lined ground squirrel, green monkey, and naked mole-rat. Functional information has been added to records via the assignment of annotations based on sequence similarity to human, rat, and mouse genes. RGD has also imported well-supported cross-species data from external resources. To enable use of these data, RGD has developed a robust infrastructure of standardized ontologies, data formats, and disease- and species-centric portals, complemented with a suite of innovative tools for discovery and analysis. Using examples of single-gene and polygenic human diseases, we illustrate how data from multiple species can help to identify or confirm a gene as involved in a disease and to identify model organisms that can be studied to understand the pathophysiology of a gene or pathway. The ultimate aim of this report is to demonstrate the utility of RGD not only as the core resource for the rat research community but also as a source of bioinformatic tools to support a wider audience, empowering the search for appropriate models for human afflictions.

2021 ◽  
Vol 22 (S11) ◽  
Author(s):  
Jooseong Oh ◽  
Sung-Gwon Lee ◽  
Chungoo Park

Abstract Background Paralogs formed through gene duplication and isoforms formed through alternative splicing have been important processes for increasing protein diversity and maintaining cellular homeostasis. Despite their recognized importance and the advent of large-scale genomic and transcriptomic analyses, paradoxically, accurate annotations of all gene loci to allow the identification of paralogs and isoforms remain surprisingly incomplete. In particular, the global analysis of the transcriptome of a non-model organism for which there is no reference genome is especially challenging. Results To reliably discriminate between the paralogs and isoforms in RNA-seq data, we redefined the pre-existing sequence features (sequence similarity, inverse count of consecutive identical or non-identical blocks, and match-mismatch fraction) previously derived from full-length cDNAs and EST sequences and described newly discovered genomic and transcriptomic features (twilight zone of protein sequence alignment and expression level difference). In addition, the effectiveness and relevance of the proposed features were verified with two widely used support vector machine (SVM) and random forest (RF) models. From nine RNA-seq datasets, all AUC (area under the curve) scores of ROC (receiver operating characteristic) curves were over 0.9 in the RF model and significantly higher than those in the SVM model. Conclusions In this study, using an RF model with five proposed RNA-seq features, we implemented our method called Paralogs and Isoforms Classifier based on Machine-learning approaches (PIC-Me) and showed that it outperformed an existing method. Finally, we envision that our tool will be a valuable computational resource for the genomics community to help with gene annotation and will aid in comparative transcriptomics and evolutionary genomics studies, especially those on non-model organisms.


2021 ◽  
Author(s):  
Casia Nursyifa ◽  
Anna Bruniche-Olsen ◽  
Genis Garcia-Erill ◽  
Rasmus Heller ◽  
Anders Albrechtsen

Being able to assign sex to individuals and identify autosomal and sex-linked scaffolds are essential in most population genomic analyses. Non-model organisms often have genome assemblies at scaffold level and lack characterization of sex-linked scaffolds. Previous methods to identify sex and sex-linked scaffolds have relied on e.g. sequence similarity between the non-model organism and a closely related species or prior knowledge about the sex of the samples to identify sex-linked scaffolds. In the latter case, the difference in depth of coverage between the autosomes and the sex chromosomes are used. Here we present "Sex Assignment Through Coverage" (SATC), a method to identify sample sex and sex-linked scaffolds from NGS data. The method only requires a scaffold level reference assembly and sampling of both sexes with whole genome sequencing (WGS) data. We use the sequencing depth distribution across scaffolds to jointly identify: i) male and female individuals and ii) sex-linked scaffolds. This is achieved through projecting the scaffold depths into a low-dimensional space using principal component analysis (PCA) and subsequent Gaussian mixture clustering. We demonstrate the applicability of our method using data from five mammal species and a bird species complex. The method is open source and freely available at https://github.com/popgenDK/SATC


2020 ◽  
Author(s):  
Elisa Pischedda ◽  
Cristina Crava ◽  
Martina Carlassara ◽  
Leila Gasmi ◽  
Mariangela Bonizzoni

ABSTRACTLateral gene transfer (LT) from viruses to eukaryotic cells is a well-recognized phenomenon. Somatic integrations of viruses have been linked to persistent viral infection and genotoxic effects, including various types of cancer. As a consequence, several bioinformatic tools have been developed to identify viral sequences integrated into the human genome. Viral sequences that integrate into germline cells can be transmitted vertically, be maintained in host genomes and be co-opted for host functions. Endogenous viral elements (EVEs) have long been known, but the extent of their widespread occurrence has only been recently appreciated. Modern genomic sequencing analyses showed that eukaryotic genomes may harbor hundreds of EVEs, which derive not only from DNA viruses and retroviruses, but also from nonretroviral RNA viruses and are mostly enriched in repetitive regions of the genome. Despite being increasingly recognized as important players in different biological processes such as regulation of expression and immunity, the study of EVEs in non-model organisms has rarely gone beyond their characterization from annotated reference genomes because of the lack of computational methods suited to solve signals for EVEs in repetitive DNA. To fill this gap, we developed ViR, a pipeline which ameliorates the detection of integration sites by solving the dispersion of reads in genome assemblies that are rich of repetitive DNA. Using paired-end whole genome sequencing (WGS) data and a user-built database of viral genomes, ViR selects the best candidate couples of reads supporting an integration site by solving the dispersion of reads resulting from intrasample variability. We benchmarked ViR to work with sequencing data from both single and pooled DNA samples and show its applicability using WGS data of a non-model organism, the arboviral vector Aedes albopictus. Viral integrations predicted by ViR were molecularly validated supporting the accuracy of ViR results. Additionally, ViR can be readily adopted to detect any LT event providing ad hoc non-host sequences to interrogate.


2021 ◽  
Author(s):  
Stacia R Engel ◽  
Edith D Wong ◽  
Robert S Nash ◽  
Suzi Aleksander ◽  
Micheal Alexander ◽  
...  

Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.


2006 ◽  
Vol 14 (1) ◽  
pp. 1.14.1-1.14.27 ◽  
Author(s):  
Simon N. Twigger ◽  
Jennifer S. Smith ◽  
Angela Zuniga-Meyer ◽  
Susan K. Bromberg

2021 ◽  
Author(s):  
Malcolm E Fisher ◽  
Erik J Segerdell ◽  
Nicolas Matentzoglu ◽  
Mardi J Nenni ◽  
Joshua D Fortriede ◽  
...  

Background: Ontologies of precisely defined, controlled vocabularies are essential to curate the results of biological experiments such that the data are machine searchable, can be computationally analyzed, and are interoperable across the biomedical research continuum. There is also an increasing need for methods to interrelate phenotypic data easily and accurately from experiments in animal models with human development and disease. Results: Here we present the Xenopus Phenotype Ontology (XPO) to annotate phenotypic data from experiments in Xenopus, one of the major vertebrate model organisms used to study gene function in development and disease. The XPO implements design patterns from the Unified Phenotype Ontology (uPheno), and the principles outlined by the Open Biological and Biomedical Ontologies (OBO Foundry) to maximize interoperability with other species and facilitate ongoing ontology management. Constructed in Web Ontology Language (OWL) the XPO combines the existing uPheno library of ontology design patterns with additional terms from the Xenopus Anatomy Ontology (XAO), the Phenotype and Trait Ontology (PATO) and the Gene Ontology (GO). The integration of these different ontologies into the XPO enables rich phenotypic curation, whilst the uPheno bridging axioms allows phenotypic data from Xenopus experiments to be related to phenotype data from other model organisms and human disease. Moreover, the simple post-composed uPheno design patterns facilitate ongoing XPO development as the generation of new terms and classes of terms can be substantially automated. Conclusions: The XPO serves as an example of current best practices to help overcome many of the inherent challenges in harmonizing phenotype data between different species. The XPO currently consists of approximately 22,000 terms and is being used to curate phenotypes by Xenbase, the Xenopus Model Organism Knowledgebase, forming a standardized corpus of genotype-phenotype data that can be directly related to other uPheno compliant resources.


Author(s):  
Stanley J. F. Laulederkind ◽  
G. Thomas Hayman ◽  
Shur‐Jen Wang ◽  
Timothy F. Lowry ◽  
Rajni Nigam ◽  
...  

2020 ◽  
Author(s):  
Takuto Kaji ◽  
Yusuke Oizumi ◽  
Sanki Tashiro ◽  
Yumiko Takeshita ◽  
Junko Kanoh

AbstractGenome sequences have been determined for many model organisms; however, repetitive regions such as centromeres, telomeres, and subtelomeres have not yet been sequenced completely. Here, we report the complete sequences of subtelomeric homologous (SH) regions of the fission yeast Schizosaccharomyces pombe. We overcame technical difficulties to obtain subtelomeric repetitive sequences by constructing strains that possess single SH regions. Whole sequences of SH regions revealed that each SH region consists of two distinct parts: the telomere-proximal part with mosaics of multiple common segments showing high variation among subtelomeres and strains, and the telomere-distal part showing high sequence similarity among subtelomeres with some insertions and deletions. The newly sequenced SH regions showed differences in nucleotide sequences and common segment composition compared to those in the S. pombe genome database (PomBase), which is in striking contrast to the regions outside of SH, where mutations are rarely detected. Furthermore, we identified new subtelomeric RecQ-type helicase genes, tlh3 and tlh4, which add to the already known tlh1 and tlh2, and found that the tlh1–4 genes show high sequence variation. Our results indicate that SH sequences are highly polymorphic and hot spots for genome variation. These features of subtelomeres may have contributed to genome diversity and, conversely, various diseases.


Genetics ◽  
2021 ◽  
Author(s):  
Stacia R Engel ◽  
Edith D Wong ◽  
Robert S Nash ◽  
Suzi Aleksander ◽  
Micheal Alexander ◽  
...  

Abstract Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.


Animals ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 2226
Author(s):  
Sazia Kunvar ◽  
Sylwia Czarnomska ◽  
Cino Pertoldi ◽  
Małgorzata Tokarska

The European bison is a non-model organism; thus, most of its genetic and genomic analyses have been performed using cattle-specific resources, such as BovineSNP50 BeadChip or Illumina Bovine 800 K HD Bead Chip. The problem with non-specific tools is the potential loss of evolutionary diversified information (ascertainment bias) and species-specific markers. Here, we have used a genotyping-by-sequencing (GBS) approach for genotyping 256 samples from the European bison population in Bialowieza Forest (Poland) and performed an analysis using two integrated pipelines of the STACKS software: one is de novo (without reference genome) and the other is a reference pipeline (with reference genome). Moreover, we used a reference pipeline with two different genomes, i.e., Bos taurus and European bison. Genotyping by sequencing (GBS) is a useful tool for SNP genotyping in non-model organisms due to its cost effectiveness. Our results support GBS with a reference pipeline without PCR duplicates as a powerful approach for studying the population structure and genotyping data of non-model organisms. We found more polymorphic markers in the reference pipeline in comparison to the de novo pipeline. The decreased number of SNPs from the de novo pipeline could be due to the extremely low level of heterozygosity in European bison. It has been confirmed that all the de novo/Bos taurus and Bos taurus reference pipeline obtained SNPs were unique and not included in 800 K BovineHD BeadChip.


Sign in / Sign up

Export Citation Format

Share Document