Xyntagma: a graphical query interface for the ACeDB genome databases

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Examination of the predicted prevalence of Gitelman syndrome by ethnicity based on genome databases

Scientific Reports ◽

10.1038/s41598-021-95521-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Atsushi Kondo ◽

China Nagano ◽

Shinya Ishiko ◽

Takashi Omori ◽

Yuya Aoto ◽

...

Keyword(s):

Allele Frequency ◽

Carrier Frequency ◽

Japanese Population ◽

Autosomal Recessive ◽

Disease Prevalence ◽

Gitelman Syndrome ◽

Variant Allele ◽

Variant Allele Frequency ◽

Genome Databases ◽

Heterozygous Carriers

AbstractGitelman syndrome is an autosomal recessive inherited salt-losing tubulopathy. It has a prevalence of around 1 in 40,000 people, and heterozygous carriers are estimated at approximately 1%, although the exact prevalence is unknown. We estimated the predicted prevalence of Gitelman syndrome based on multiple genome databases, HGVD and jMorp for the Japanese population and gnomAD for other ethnicities, and included all 274 pathogenic missense or nonsense variants registered in HGMD Professional. The frequencies of all these alleles were summed to calculate the total variant allele frequency in SLC12A3. The carrier frequency and the disease prevalence were assumed to be twice and the square of the total allele frequency, respectively, according to the Hardy–Weinberg principle. In the Japanese population, the total carrier frequencies were 0.0948 (9.5%) and 0.0868 (8.7%) and the calculated prevalence was 0.00225 (2.3 in 1000 people) and 0.00188 (1.9 in 1000 people) in HGVD and jMorp, respectively. Other ethnicities showed a prevalence varying from 0.000012 to 0.00083. These findings indicate that the prevalence of Gitelman syndrome in the Japanese population is higher than expected and that some other ethnicities also have a higher prevalence than has previously been considered.

Download Full-text

Lessons from a "living in a database" graphical query interface

ACM SIGMOD Record ◽

10.1145/971697.602273 ◽

1984 ◽

Vol 14 (2) ◽

pp. 100-106 ◽

Cited By ~ 9

Author(s):

Dennis Fogg

Keyword(s):

Query Interface

Download Full-text

Distinguishing pathogenic mutations from background genetic noise in cardiology: The use of large genome databases for genetic interpretation

Clinical Genetics ◽

10.1111/cge.13066 ◽

2017 ◽

Vol 93 (3) ◽

pp. 459-466 ◽

Cited By ~ 11

Author(s):

J. Ghouse ◽

M.W. Skov ◽

R.S. Bigseth ◽

G. Ahlberg ◽

J.K. Kanters ◽

...

Keyword(s):

Large Genome ◽

Genome Databases ◽

Pathogenic Mutations ◽

Genetic Interpretation

Download Full-text

Accessing Distributed WFS Data Through A RDF Query Interface

International Conference on GIScience Short Paper Proceedings ◽

10.21433/b3119fs8s68v ◽

2016 ◽

Vol 1 ◽

Author(s):

Tian Zhao ◽

Chuanrong Zhang ◽

Weidong Li

Keyword(s):

Query Interface

Download Full-text

Research problems in genome databases

Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '95 ◽

10.1145/212433.220202 ◽

1995 ◽

Cited By ~ 3

Author(s):

Nathan Goodman

Keyword(s):

Genome Databases ◽

Research Problems

Download Full-text

Enabling Discovery Through Online Cancer Genome Databases and Analytic Tools

Cancer Chemoprevention ◽

10.1007/978-1-59259-768-0_7 ◽

2008 ◽

pp. 109-114

Author(s):

Robert L. Strausberg ◽

Gregory J. Riggins

Keyword(s):

Cancer Genome ◽

Genome Databases

Download Full-text

Leveraging Curation Among Escherichia coli Pathway/Genome Databases Using Ortholog-Based Annotation Propagation

Frontiers in Microbiology ◽

10.3389/fmicb.2021.614355 ◽

2021 ◽

Vol 12 ◽

Author(s):

Suzanne Paley ◽

Ingrid M. Keseler ◽

Markus Krummenacker ◽

Peter D. Karp

Keyword(s):

Escherichia Coli ◽

Protein Complexes ◽

Limited Resources ◽

Genome Database ◽

Single Strain ◽

Manual Curation ◽

Genome Databases ◽

New Knowledge ◽

K 12 ◽

New Protein

Updating genome databases to reflect newly published molecular findings for an organism was hard enough when only a single strain of a given organism had been sequenced. With multiple sequenced strains now available for many organisms, the challenge has grown significantly because of the still-limited resources available for the manual curation that corrects errors and captures new knowledge. We have developed a method to automatically propagate multiple types of curated knowledge from genes and proteins in one genome database to their orthologs in uncurated databases for related strains, imposing several quality-control filters to reduce the chances of introducing errors. We have applied this method to propagate information from the highly curated EcoCyc database for Escherichia coli K–12 to databases for 480 other Escherichia coli strains in the BioCyc database collection. The increase in value and utility of the target databases after propagation is considerable. Target databases received updates for an average of 2,535 proteins each. In addition to widespread addition and regularization of gene and protein names, 97% of the target databases were improved by the addition of at least 200 new protein complexes, at least 800 new or updated reaction assignments, and at least 2,400 sets of GO annotations.

Download Full-text