Providing gene-to-variant and variant-to-gene database identifier mappings to use with BridgeDb mapping services.

Database identifier mapping services are important to make database information interoperable. BridgeDb offers such a service. Available mapping for BridgeDb link 1. genes and gene products identifiers, 2. metabolite identifiers and InChI structure description, and 3. identifiers for biochemical reactions and interactions between multiple resources that use such IDs while the mappings are obtained from multiple sources. In this study we created BridgeDb mapping databases for selections of genes-to-variants (and variants-to-genes) based on the variants described in Ensembl. Moreover, we demonstrated the use of these mappings in different software tools like R, PathVisio, Cytoscape and a local installation using Docker. The variant mapping databases are now described on the BridgeDb website and are available from the BridgeDb mapping database repository and updated according to the regular BridgeDb mapping update schedule.

Download Full-text

GENE ONTOLOGY SIMILARITY MEASURES BASED ON LINEAR ORDER STATISTICS

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488506004254 ◽

2006 ◽

Vol 14 (06) ◽

pp. 639-661 ◽

Cited By ~ 7

Author(s):

JAMES M. KELLER ◽

JAMES C. BEZDEK ◽

MIHAIL POPESCU ◽

NIKHIL R. PAL ◽

JOYCE A. MITCHELL ◽

...

Keyword(s):

Gene Ontology ◽

Order Statistics ◽

Gene Product ◽

Linear Order ◽

Similarity Measures ◽

Product Family ◽

Amino Acid Sequences ◽

Gene Products ◽

Multiple Sources ◽

Similarity Relations

The standard method for comparing gene products (proteins or RNA) is to compare their DNA or amino acid sequences. Additional information about some gene products may come from multiple sources, including the set of Gene Ontology (GO) annotations and the set of journal abstracts related to each gene product. Gene product similarity measures can be based on evaluating sets of descriptor terms found in the GO taxonomy, and/or the index term sets of the related documents (MeSH annotations). While our techniques can be applied to term sets from any taxonomy, we restrict our examples in this article to GO annotations. We investigate the use of linear order statistics (LOS) to build similarity relations on pairs of terms that are used in the GO as linguistic descriptors of genes and gene products. One of our objectives is to investigate the construction and utility of visual assessments of relational data (in this case, dissimilarity matrices) for discovering tendencies of groups of gene products to "cluster together". We use gene product data derived from a group of 194 gene products representing three protein families extracted from ENSEMBL. Our examples suggest that LOS similarity measures are more effective than traditional sequence-based similarity measures at capturing relationships between pairs of gene products in ENSEMBL families when annotation information is available. We show examples of how these similarity measures can assist in knowledge discovery and gene product family validation.

Download Full-text

Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB

Applied and Environmental Microbiology ◽

10.1128/aem.03006-05 ◽

2006 ◽

Vol 72 (7) ◽

pp. 5069-5072 ◽

Cited By ~ 6346

Author(s):

T. Z. DeSantis ◽

P. Hugenholtz ◽

N. Larsen ◽

M. Rojas ◽

E. L. Brodie ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Taxonomic Classification ◽

Rrna Gene ◽

Link Type ◽

Gene Database ◽

Public Repositories ◽

Environmental Sequences ◽

Standard Alignment

ABSTRACT A 16S rRNA gene database (http://greengenes.lbl.gov ) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.

Download Full-text

MicrobeTrace: Retooling Molecular Epidemiology for Rapid Public Health Response

10.1101/2020.07.22.216275 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ellsworth M. Campbell ◽

Anthony Boyles ◽

Anupama Shankar ◽

Jay Kim ◽

Sergey Knyazev ◽

...

Keyword(s):

Public Health ◽

Molecular Epidemiology ◽

Spanning Trees ◽

Healthcare Providers ◽

Contact Tracing ◽

Surveillance Systems ◽

Multiple Sources ◽

Public Health Response ◽

Link Type ◽

Novel Approach

AbstractMotivationOutbreak investigations use data from interviews, healthcare providers, laboratories and surveillance systems. However, integrated use of data from multiple sources requires a patchwork of software that present challenges in usability, interoperability, confidentiality, and cost. Rapid integration, visualization and analysis of data from multiple sources can guide effective public health interventions.ResultsWe developed MicrobeTrace to facilitate rapid public health responses by overcoming barriers to data integration and exploration in molecular epidemiology. Using publicly available HIV sequences and other data, we demonstrate the analysis of viral genetic distance networks and introduce a novel approach to minimum spanning trees that simplifies results. We also illustrate the potential utility of MicrobeTrace in support of contact tracing by analyzing and displaying data from an outbreak of SARS-CoV-2 in South Korea in early 2020.Availability and ImplementationMicrobeTrace is a web-based, client-side, JavaScript application (https://microbetrace.cdc.gov) that runs in Chromium-based browsers and remains fully-operational without an internet connection. MicrobeTrace is developed and actively maintained by the Centers for Disease Control and Prevention. The source code is available at https://github.com/cdcgov/[email protected]

Download Full-text

Apollo: Democratizing genome annotation

10.1101/512376 ◽

2019 ◽

Author(s):

Nathan Dunn ◽

Deepak Unni ◽

Colin Diesh ◽

Monica Munoz-Torres ◽

Nomi L. Harris ◽

...

Keyword(s):

Genome Annotation ◽

Genomic Analysis ◽

List Type ◽

Sources Of Information ◽

Multiple Sources ◽

Link Type ◽

Open Source Software Package ◽

Analytical Review ◽

Graphical Browser ◽

And Function

AbstractGenome annotation is the process of identifying the location and function of a genome’s encoded features. Improving the biological accuracy of annotation is a complex and iterative process requiring researchers to review and incorporate multiple sources of information such as transcriptome alignments, predictive models based on sequence profiles, and comparisons to features found in related organisms. Because rapidly decreasing costs are enabling an ever-growing number of scientists to incorporate sequencing as a routine laboratory technique, there is widespread demand for tools that can assist in the deliberative analytical review of genomic information. To this end, Apollo is an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform.In this paper we first outline some of Apollo’s newer user interface features, which were driven by the needs of this expanding genomics community. These include support for real-time collaboration, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region in a manner similar to Google Docs. Its technical architecture enables Apollo to be integrated into multiple existing genomic analysis pipelines and heterogeneous laboratory workflow platforms. Finally, we consider the implications that Apollo and related applications may have on how the results of genome research are published and made accessible. Source: https://github.com/GMOD/ApolloLicense (BSD-3): https://github.com/GMOD/Apollo/blob/master/LICENSE.mdDocker: https://hub.docker.com/r/gmod/apollo/tags/, https://github.com/GMOD/docker-apolloRequirements: JDK 1.8, Node v6.0+User guide: http://genomearchitect.org; technical guide: http://genomearchitect.readthedocs.io/en/latest/Mailing list: [email protected]

Download Full-text

Overcoming the impacts of two-step batch effect correction on gene expression estimation and inference

10.1101/2021.01.24.428009 ◽

2021 ◽

Author(s):

Tenglong Li ◽

Yuqing Zhang ◽

Prasad Patil ◽

W. Evan Johnson

Keyword(s):

Differential Expression Analysis ◽

Final Analysis ◽

Generalized Least Squares ◽

R Package ◽

Batch Effect ◽

Batch Effects ◽

Multiple Sources ◽

Link Type ◽

Corrected Data ◽

One Step

AbstractNon-ignorable technical variation is commonly observed across data from multiple experimental runs, platforms, or studies. These so-called batch effects can lead to difficulty in merging data from multiple sources, as they can severely bias the outcome of the analysis. Many groups have developed approaches for removing batch effects from data, usually by accommodating batch variables into the analysis (one-step correction) or by preprocessing the data prior to the formal or final analysis (two-step correction). One-step correction is often desirable due it its simplicity, but its flexibility is limited and it can be difficult to include batch variables uniformly when an analysis has multiple stages. Two-step correction allows for richer models of batch mean and variance. However, prior investigation has indicated that two-step correction can lead to incorrect statistical inference in downstream analysis. Generally speaking, two-step approaches introduce a correlation structure in the corrected data, which, if ignored, may lead to either exaggerated or diminished significance in downstream applications such as differential expression analysis. Here, we provide more intuitive and more formal evaluations of the impacts of two-step batch correction compared to existing literature. We demonstrate that the undesired impacts of two-step correction (exaggerated or diminished significance) depend on both the nature of the study design and the batch effects. We also provide strategies for overcoming these negative impacts in downstream analyses using the estimated correlation matrix of the corrected data. We compare the results of our proposed workflow with the results from other published one-step and two-step methods and show that our methods lead to more consistent false discovery controls and power of detection across a variety of batch effect scenarios. Software for our method is available through GitHub (https://github.com/jtleek/sva-devel) and will be available in future versions of the sva R package in the Bioconductor project (https://bioconductor.org/packages/release/bioc/html/sva.html). Batch effect; Two-step batch adjustment; ComBat; Sample correlation adjustment; Generalized least squares

Download Full-text

MicrobeTrace: Retooling molecular epidemiology for rapid public health response

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009300 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009300 ◽

Cited By ~ 1

Author(s):

Ellsworth M. Campbell ◽

Anthony Boyles ◽

Anupama Shankar ◽

Jay Kim ◽

Sergey Knyazev ◽

...

Keyword(s):

Public Health ◽

Molecular Epidemiology ◽

Spanning Trees ◽

Healthcare Providers ◽

Contact Tracing ◽

Surveillance Systems ◽

Multiple Sources ◽

Public Health Response ◽

Link Type ◽

Novel Approach

Outbreak investigations use data from interviews, healthcare providers, laboratories and surveillance systems. However, integrated use of data from multiple sources requires a patchwork of software that present challenges in usability, interoperability, confidentiality, and cost. Rapid integration, visualization and analysis of data from multiple sources can guide effective public health interventions. We developed MicrobeTrace to facilitate rapid public health responses by overcoming barriers to data integration and exploration in molecular epidemiology. MicrobeTrace is a web-based, client-side, JavaScript application (https://microbetrace.cdc.gov) that runs in Chromium-based browsers and remains fully operational without an internet connection. Using publicly available data, we demonstrate the analysis of viral genetic distance networks and introduce a novel approach to minimum spanning trees that simplifies results. We also illustrate the potential utility of MicrobeTrace in support of contact tracing by analyzing and displaying data from an outbreak of SARS-CoV-2 in South Korea in early 2020. MicrobeTrace is developed and actively maintained by the Centers for Disease Control and Prevention. Users can email [email protected] for support. The source code is available at https://github.com/cdcgov/microbetrace.

Download Full-text

JIB.tools 2.0 – A Bioinformatics Registry for Journal Published Tools with Interoperability to bio.tools

Journal of Integrative Bioinformatics ◽

10.1515/jib-2019-0059 ◽

2020 ◽

Vol 16 (4) ◽

Author(s):

Marcel Friedrichs ◽

Alban Shoshi ◽

Piotr Jaroslaw Chmura ◽

Jon Ison ◽

Veit Schwämmle ◽

...

Keyword(s):

Life Sciences ◽

Software Tools ◽

New Approach ◽

Workflow Systems ◽

Publication Process ◽

Software Applications ◽

Link Type ◽

Journal Publications ◽

Status Information ◽

Information Repository

AbstractJIB.tools 2.0 is a new approach to more closely embed the curation process in the publication process. This website hosts the tools, software applications, databases and workflow systems published in the Journal of Integrative Bioinformatics (JIB). As soon as a new tool-related publication is published in JIB, the tool is posted to JIB.tools and can afterwards be easily transferred to bio.tools, a large information repository of software tools, databases and services for bioinformatics and the life sciences. In this way, an easily-accessible list of tools is provided which were published in JIB a well as status information regarding the underlying service. With newer registries like bio.tools providing these information on a bigger scale, JIB.tools 2.0 closes the gap between journal publications and registry publication. (Reference: https://jib.tools).

Download Full-text

SREQP: A Solar Radiation Extraction and Query Platform for the Production and Consumption of Linked Data from Weather Stations Sensors

Journal of Sensors ◽

10.1155/2016/2825653 ◽

2016 ◽

Vol 2016 ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

José Luis Sánchez-Cervantes ◽

Mateusz Radzimski ◽

Cristian Aaron Rodriguez-Enriquez ◽

Giner Alor-Hernández ◽

Lisbeth Rodríguez-Mazahua ◽

...

Keyword(s):

Solar Radiation ◽

Linked Data ◽

Sensor Data ◽

Multiple Sources ◽

Analytic Hierarchy ◽

Multiple Resources ◽

Sensor Web ◽

Production And Consumption ◽

External Sources ◽

Web Systems

Nowadays, solar radiation information is provided from sensors installed in different geographic locations and platforms of meteorological agencies. However, common formats such as PDF files and HTML documents to provide solar radiation information do not offer semantics in their content, and they may pose problems to integrate and fuse data from multiple resources. One of the challenges of sensors Web is the unification of data from multiple sources, although this type of information facilitates interoperability with other sensor Web systems. This research proposes architecture SREQP (Solar Radiation Extraction and Query Platform) to extract solar radiation data from multiple external sources and merge them on a single and unique platform. SREQP makes use of Linked Data to generate a set of triples containing information about extracted data, which allows final users to query data through a SPARQL endpoint. The conceptual model was developed by using known vocabularies, such as SSN or WGS84. Moreover, an Analytic Hierarchy Process was carried out for the evaluation of SREQP in order to identify and evaluate the main features of Linked-Sensor-Data and the sensor Web systems. Results from the evaluation indicated that SREQP contained most of the features considered essential in Linked-Sensor-Data and sensor Web systems.

Download Full-text

CyMIRA: The Cytonuclear Molecular Interactions Reference forArabidopsis

10.1101/614487 ◽

2019 ◽

Author(s):

Evan S. Forsythe ◽

Joel Sharbrough ◽

Justin C. Havird ◽

Jessica M. Warren ◽

Daniel B. Sloan

Keyword(s):

Molecular Interactions ◽

Detailed Knowledge ◽

Gene Products ◽

Subcellular Targeting ◽

Central Importance ◽

Isolated Populations ◽

Link Type ◽

Evolutionary Genomic ◽

Level Information ◽

Evolutionary Consequences

ABSTRACTThe function and evolution of eukaryotic cells depends upon direct molecular interactions between gene products encoded in nuclear and cytoplasmic genomes. Understanding how these cytonuclear interactions drive molecular evolution and generate genetic incompatibilities between isolated populations and species is of central importance to eukaryotic biology. Plants are an outstanding system to investigate such effects because of their two different genomic compartments present in the cytoplasm (mitochondria and plastids) and the extensive resources detailing subcellular targeting of nuclear-encoded proteins. However, the field lacks a consistent classification scheme for mitochondrial- and plastid-targeted proteins based on their molecular interactions with cytoplasmic genomes and gene products, which hinders efforts to standardize and compare results across studies. Here, we take advantage of detailed knowledge about the model angiospermArabidopsis thalianato provide a curated database of plant cytonuclear interactions at the molecular level. CyMIRA (CytonuclearMolecularInteractionsReference forArabidopsis) is available athttp://cymira.colostate.edu/andhttps://github.com/dbsloan/cymiraand will serve as a resource to aid researchers in partitioning evolutionary genomic data into functional gene classes based on organelle targeting and direct molecular interaction with cytoplasmic genomes and gene products. It includes 11 categories (and 27 subcategories) of different cytonuclear complexes and types of molecular interactions, and it reports residue-level information for cytonuclear contact sites. We hope that this framework will make it easier to standardize, interpret and compare studies testing the functional and evolutionary consequences of cytonuclear interactions.

Download Full-text

Biobtree: A tool to search, map and visualize bioinformatics identifiers and special keywords

F1000Research ◽

10.12688/f1000research.17927.1 ◽

2019 ◽

Vol 8 ◽

pp. 145

Author(s):

Tamer Gur

Keyword(s):

Data Structure ◽

Web Services ◽

Software Tools ◽

Web Interface ◽

Bioinformatics Tool ◽

Link Type ◽

Executable File ◽

Tree Data ◽

Tree Data Structure ◽

Gene Symbols

Due to their nature, bioinformatics datasets are often closely related to each other. For this reason, search, mapping and visualization of these relations are often performed manually or programmatically via identifiers or special keywords such as gene symbols. Although various tools exist for these situations, the growing volume of bioinformatics datasets, emerging new software tools and approaches motivates new solutions. To provide a new tool for these current cases, I present the Biobtree bioinformatics tool. Biobtree effectively fetches and indexes identifiers and special keywords with their related identifiers from supported datasets, optionally with user pre-defined datasets and provides a web interface, web services and direct B+ tree data structure based single uniform database output. Biobtree can handle billions of identifiers and runs via a single executable file with no installation and dependency required. It also aims to provide a relatively small codebase for easy maintenance, addition of new features and extension to larger datasets. Biobtree is available to download from GitHub.

Download Full-text