Bioinformatics Web Portals

Selected Readings on Database Technologies and Applications ◽

10.4018/978-1-60566-098-1.ch016 ◽

2011 ◽

pp. 330-351

Author(s):

Mario Cannataro

Keyword(s):

Structure Prediction ◽

Heterogeneous Data ◽

Software Tools ◽

Biological Data ◽

Data Generation ◽

Biological Databases ◽

Protein Databases ◽

Large Size ◽

Computationally Expensive ◽

The Right

Bioinformatics involves the design and development of advanced algorithms and computational platforms to solve problems in biomedicine (Jones & Pevzner, 2004). It also deals with methods for acquiring, storing, retrieving and analysing biological data obtained by querying biological databases or provided by experiments. Bioinformatics applications involve different datasets as well as different software tools and algorithms. Such applications need semantic models for basic software components and need advanced scientific portal services able to aggregate such different components and to hide their details and complexity from the final user. For instance, proteomics applications involve datasets, either produced by experiments or available as public databases, as well as a huge number of different software tools and algorithms. To use such applications it is required to know both biological issues related to data generation and results interpretation and informatics requirements related to data analysis. Bioinformatics applications require platforms that are computationally out of standard. Applications are indeed (1) naturally distributed, due to the high number of involved datasets; (2) require high computing power, due to the large size of datasets and the complexity of basic computations; (3) access heterogeneous data both in format and structure; and finally (5) require reliability and security. For instance, applications such as identification of proteins from spectra data (de Hoffmann & Stroobant, 2002), querying of protein databases (Swiss-Prot), predictions of proteins structures (Guerra & Istrail, 2003), and string-based pattern extraction from large biological sequences, are some examples of computationally expensive applications. Moreover, expertise is required in choosing the most appropriate tools. For instance, protein structure prediction depends on proteins family, so choosing the right tool may strongly influence the experimental results.

Download Full-text

A Comparative Analysis of Biological Data Integration Systems Famous for Data Exploitation and Knowledge Discovery

Current Bioinformatics ◽

10.2174/1574893615999210101125442 ◽

2021 ◽

Vol 15 ◽

Author(s):

Omer Irshad ◽

Muhammad Usman Ghani Khan

Keyword(s):

Comparative Analysis ◽

Data Integration ◽

Biological Data ◽

Omics Data ◽

Data Generation ◽

Biological Databases ◽

Future Data ◽

Design Characteristics ◽

Biological Data Integration ◽

Omics Data Integration

: Integrating heterogeneous biological databases for unveiling the new intra-molecular and inter-molecular attributes, behaviors, and relationships in the human cellular system has always been a focused research area of computational biology. In this context, a lot of biological data integration systems have been deployed in the last couple of decades. One of the prime and common objectives of all these systems is to better facilitate the end-users for exploring, exploiting, and analyzing the integrated biological data for knowledge extraction. With the advent of especially highthroughput data generation technologies, biological data is growing and dispersing continuously, exponentially, heterogeneously, and geographically. Due to this, biological data integration systems are too facing data integration and data organization-related current and future challenges. The objective of this review is to quantitatively evaluate and compare some of the recent warehouse-based multi-omics data integration systems to check their compliance with the current and future data integration needs. For this, we identified some of the major data integration design characteristics that should be in the multi-omics data integration model to comprehensively address the current and future data integration challenges. Based on these design characteristics and the evaluation criteria, we evaluated some of the recent data warehouse systems and showed categorical and comparative analysis results. Results show that most of the systems exhibit no or partial compliance with the required data integration design characteristics. So, these systems need design improvements to adequately address the current and future data integration challenges while keeping their service level commitments in place.

Download Full-text

Streamlining Data-Intensive Biology With Workflow Systems

10.1101/2020.06.30.178673 ◽

2020 ◽

Cited By ~ 1

Author(s):

Taylor Reiter ◽

Phillip T. Brooks ◽

Luiz Irber ◽

Shannon E.K. Joslin ◽

Charles M. Reid ◽

...

Keyword(s):

Data Analysis ◽

Large Scale ◽

High Throughput Sequencing ◽

Sequence Data ◽

Open Science ◽

Biological Data ◽

Data Generation ◽

Biological Sequence ◽

Sequencing Data ◽

Workflow Systems

AbstractAs the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis, and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of practices and strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these strategies in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.Author SummaryWe present a guide for workflow-enabled biological sequence data analysis, developed through our own teaching, training and analysis projects. We recognize that this is based on our own use cases and experiences, but we hope that our guide will contribute to a larger discussion within the open source and open science communities and lead to more comprehensive resources. Our main goal is to accelerate the research of scientists conducting sequence analyses by introducing them to organized workflow practices that not only benefit their own research but also facilitate open and reproducible science.

Download Full-text

Streamlining data-intensive biology with workflow systems

GigaScience ◽

10.1093/gigascience/giaa140 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Taylor Reiter ◽

Phillip T Brooks† ◽

Luiz Irber† ◽

Shannon E K Joslin† ◽

Charles M Reid† ◽

...

Keyword(s):

Data Analysis ◽

Large Scale ◽

High Throughput Sequencing ◽

Biological Data ◽

Data Generation ◽

Sequencing Data ◽

Workflow Systems ◽

Data Intensive ◽

High Throughput Sequencing Data ◽

Project Data

Abstract As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.

Download Full-text

Enalos Suite of Tools: Enhancing Cheminformatics and Nanoinfor - matics through KNIME

Current Medicinal Chemistry ◽

10.2174/0929867327666200727114410 ◽

2020 ◽

Vol 27 (38) ◽

pp. 6523-6535 ◽

Cited By ~ 3

Author(s):

Antreas Afantitis ◽

Andreas Tsoumanis ◽

Georgia Melagraki

Keyword(s):

Data Analysis ◽

Virtual Screening ◽

In Silico ◽

Model Development ◽

In Silico Analysis ◽

Material Design ◽

Efficient Solutions ◽

Biological Data ◽

Cost Efficient ◽

Silico Analysis

Drug discovery as well as (nano)material design projects demand the in silico analysis of large datasets of compounds with their corresponding properties/activities, as well as the retrieval and virtual screening of more structures in an effort to identify new potent hits. This is a demanding procedure for which various tools must be combined with different input and output formats. To automate the data analysis required we have developed the necessary tools to facilitate a variety of important tasks to construct workflows that will simplify the handling, processing and modeling of cheminformatics data and will provide time and cost efficient solutions, reproducible and easier to maintain. We therefore develop and present a toolbox of >25 processing modules, Enalos+ nodes, that provide very useful operations within KNIME platform for users interested in the nanoinformatics and cheminformatics analysis of chemical and biological data. With a user-friendly interface, Enalos+ Nodes provide a broad range of important functionalities including data mining and retrieval from large available databases and tools for robust and predictive model development and validation. Enalos+ Nodes are available through KNIME as add-ins and offer valuable tools for extracting useful information and analyzing experimental and virtual screening results in a chem- or nano- informatics framework. On top of that, in an effort to: (i) allow big data analysis through Enalos+ KNIME nodes, (ii) accelerate time demanding computations performed within Enalos+ KNIME nodes and (iii) propose new time and cost efficient nodes integrated within Enalos+ toolbox we have investigated and verified the advantage of GPU calculations within the Enalos+ nodes. Demonstration data sets, tutorial and educational videos allow the user to easily apprehend the functions of the nodes that can be applied for in silico analysis of data.

Download Full-text

Integrating Biological Data Sources and Data Analysis Tools through Mediators (available online only)

Proceedings of the 2004 ACM symposium on Applied computing - SAC '04 ◽

10.1145/967900.980091 ◽

2004 ◽

Cited By ~ 4

Author(s):

J. F. Aldana ◽

M. Roldán ◽

I. Navas ◽

A. J. Pérez ◽

O. Trelles

Keyword(s):

Data Analysis ◽

Biological Data ◽

Data Sources ◽

Analysis Tools

Download Full-text

Graph Cutting in Image Processing Handling with Biological Data Analysis

Advances in Intelligent Systems and Computing - Information Technology, Systems Research, and Computational Physics ◽

10.1007/978-3-030-18058-4_16 ◽

2019 ◽

pp. 203-216

Author(s):

Mária Ždímalová ◽

Tomáš Bohumel ◽

Katarína Plachá-Gregorovská ◽

Peter Weismann ◽

Hisham El Falougy

Keyword(s):

Image Processing ◽

Data Analysis ◽

Biological Data ◽

Biological Data Analysis

Download Full-text

Agents in bioinformatics

The Knowledge Engineering Review ◽

10.1017/s0269888905000433 ◽

2005 ◽

Vol 20 (2) ◽

pp. 117-125 ◽

Cited By ~ 5

Author(s):

MICHAEL LUCK ◽

EMANUELA MERELLI

Keyword(s):

Data Analysis ◽

Simulation Models ◽

Biological Data ◽

System Modelling ◽

Agent Based Simulation ◽

Huge Amount ◽

Agent Based ◽

Information Agents ◽

New Knowledge ◽

Definition Of

The scope of the Technical Forum Group (TFG) on Agents in Bioinformatics (BIOAGENTS) was to inspire collaboration between the agent and bioinformatics communities with the aim of creating an opportunity to propose a different (agent-based) approach to the development of computational frameworks both for data analysis in bioinformatics and for system modelling in computational biology. During the day, the participants examined the future of research on agents in bioinformatics primarily through 12 invited talks selected to cover the most relevant topics. From the discussions, it became clear that there are many perspectives to the field, ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages for use by information agents, and to the use of Grid agents, each of which requires further exploration. The interactions between participants encouraged the development of applications that describe a way of creating agent-based simulation models of biological systems, starting from an hypothesis and inferring new knowledge (or relations) by mining and analysing the huge amount of public biological data. In this report we summarize and reflect on the presentations and discussions.

Download Full-text

Extension of the sasCIF format and its applications for data processing and deposition

Journal of Applied Crystallography ◽

10.1107/s1600576715024942 ◽

2016 ◽

Vol 49 (1) ◽

pp. 302-310 ◽

Cited By ~ 8

Author(s):

Michael Kachala ◽

John Westbrook ◽

Dmitri Svergun

Keyword(s):

Data Analysis ◽

Data Processing ◽

Data Exchange ◽

Hybrid Methods ◽

Data Bank ◽

Relevant Information ◽

Experimental Information ◽

Biological Data ◽

Task Forces ◽

Software Modules

Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files (sasCIFtools) have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.

Download Full-text