Extension of the sasCIF format and its applications for data processing and deposition

Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files (sasCIFtools) have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.

Download Full-text

Enalos Suite of Tools: Enhancing Cheminformatics and Nanoinfor - matics through KNIME

Current Medicinal Chemistry ◽

10.2174/0929867327666200727114410 ◽

2020 ◽

Vol 27 (38) ◽

pp. 6523-6535 ◽

Cited By ~ 3

Author(s):

Antreas Afantitis ◽

Andreas Tsoumanis ◽

Georgia Melagraki

Keyword(s):

Data Analysis ◽

Virtual Screening ◽

In Silico ◽

Model Development ◽

In Silico Analysis ◽

Material Design ◽

Efficient Solutions ◽

Biological Data ◽

Cost Efficient ◽

Silico Analysis

Drug discovery as well as (nano)material design projects demand the in silico analysis of large datasets of compounds with their corresponding properties/activities, as well as the retrieval and virtual screening of more structures in an effort to identify new potent hits. This is a demanding procedure for which various tools must be combined with different input and output formats. To automate the data analysis required we have developed the necessary tools to facilitate a variety of important tasks to construct workflows that will simplify the handling, processing and modeling of cheminformatics data and will provide time and cost efficient solutions, reproducible and easier to maintain. We therefore develop and present a toolbox of >25 processing modules, Enalos+ nodes, that provide very useful operations within KNIME platform for users interested in the nanoinformatics and cheminformatics analysis of chemical and biological data. With a user-friendly interface, Enalos+ Nodes provide a broad range of important functionalities including data mining and retrieval from large available databases and tools for robust and predictive model development and validation. Enalos+ Nodes are available through KNIME as add-ins and offer valuable tools for extracting useful information and analyzing experimental and virtual screening results in a chem- or nano- informatics framework. On top of that, in an effort to: (i) allow big data analysis through Enalos+ KNIME nodes, (ii) accelerate time demanding computations performed within Enalos+ KNIME nodes and (iii) propose new time and cost efficient nodes integrated within Enalos+ toolbox we have investigated and verified the advantage of GPU calculations within the Enalos+ nodes. Demonstration data sets, tutorial and educational videos allow the user to easily apprehend the functions of the nodes that can be applied for in silico analysis of data.

Download Full-text

Integrating Biological Data Sources and Data Analysis Tools through Mediators (available online only)

Proceedings of the 2004 ACM symposium on Applied computing - SAC '04 ◽

10.1145/967900.980091 ◽

2004 ◽

Cited By ~ 4

Author(s):

J. F. Aldana ◽

M. Roldán ◽

I. Navas ◽

A. J. Pérez ◽

O. Trelles

Keyword(s):

Data Analysis ◽

Biological Data ◽

Data Sources ◽

Analysis Tools

Download Full-text

Infrastructural approach to processing of spatial data in the problems of management of territorial development

Вычислительные технологии ◽

10.25743/ict.2018.23.16488 ◽

2018 ◽

Author(s):

И.В. Бычков ◽

Г.М. Ружников ◽

В.В. Парамонов ◽

А.С. Шумилов ◽

Р.К. Фёдоров

Keyword(s):

Data Processing ◽

Spatial Data ◽

Data Exchange ◽

Complex Analysis ◽

Data Entry ◽

Content Management ◽

Volume Data ◽

Territorial Development ◽

Distributed Services ◽

Spatial Data Processing

Рассмотрен инфраструктурный подход обработки пространственных данных для решения задач управления территориальным развитием, который основан на сервис-ориентированной парадигме, стандартах OGC, web-технологиях, WPS-сервисах и геопортале. The development of territories is a multi-dimensional and multi-aspect process, which can be characterized by large volumes of financial, natural resources, social, ecological and economic data. The data is highly localized and non-coordinated, which limits its complex analysis and usage. One of the methods of large volume data processing is information-analytical environments. The architecture and implementation of the information-analytical environment of the territorial development in the form of Geoportal is presented. Geoportal provides software instruments for spatial and thematic data exchange for its users, as well as OGC-based distributed services that deal with the data processing. Implementation of the processing and storing of the data in the form of services located on distributed servers allows simplifying their updating and maintenance. In addition, it allows publishing and makes processing to be more open and controlled process. Geoportal consists of following modules: content management system Calipso (presentation of user interface, user management, data visualization), RDBMS PostgreSQL with spatial data processing extension, services of relational data entry and editing, subsystem of launching and execution of WPS-services, as well as services of spatial data processing, deployed at the local cloud environment. The presented article states the necessity of using the infrastructural approach when creating the information-analytical environment for the territory management, which is characterized by large volumes of spatial and thematical data that needs to be processed. The data is stored in various formats and applications of service-oriented paradigm, OGC standards, web-technologies, Geoportal and distributed WPS-services. The developed software system was tested on a number of tasks that arise during the territory development.

Download Full-text

Metabolomics Data Processing and Data Analysis—Current Best Practices

10.3390/books978-3-0365-1195-5 ◽

2021 ◽

Keyword(s):

Data Analysis ◽

Best Practices ◽

Data Processing ◽

Metabolomics Data

Download Full-text

Graph Cutting in Image Processing Handling with Biological Data Analysis

Advances in Intelligent Systems and Computing - Information Technology, Systems Research, and Computational Physics ◽

10.1007/978-3-030-18058-4_16 ◽

2019 ◽

pp. 203-216

Author(s):

Mária Ždímalová ◽

Tomáš Bohumel ◽

Katarína Plachá-Gregorovská ◽

Peter Weismann ◽

Hisham El Falougy

Keyword(s):

Image Processing ◽

Data Analysis ◽

Biological Data ◽

Biological Data Analysis

Download Full-text

Agents in bioinformatics

The Knowledge Engineering Review ◽

10.1017/s0269888905000433 ◽

2005 ◽

Vol 20 (2) ◽

pp. 117-125 ◽

Cited By ~ 5

Author(s):

MICHAEL LUCK ◽

EMANUELA MERELLI

Keyword(s):

Data Analysis ◽

Simulation Models ◽

Biological Data ◽

System Modelling ◽

Agent Based Simulation ◽

Huge Amount ◽

Agent Based ◽

Information Agents ◽

New Knowledge ◽

Definition Of

The scope of the Technical Forum Group (TFG) on Agents in Bioinformatics (BIOAGENTS) was to inspire collaboration between the agent and bioinformatics communities with the aim of creating an opportunity to propose a different (agent-based) approach to the development of computational frameworks both for data analysis in bioinformatics and for system modelling in computational biology. During the day, the participants examined the future of research on agents in bioinformatics primarily through 12 invited talks selected to cover the most relevant topics. From the discussions, it became clear that there are many perspectives to the field, ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages for use by information agents, and to the use of Grid agents, each of which requires further exploration. The interactions between participants encouraged the development of applications that describe a way of creating agent-based simulation models of biological systems, starting from an hypothesis and inferring new knowledge (or relations) by mining and analysing the huge amount of public biological data. In this report we summarize and reflect on the presentations and discussions.

Download Full-text

Novel IT Technologies on the Digital Battlefield: The Application of Big Data and Data Mining Technologies

Hadmérnök ◽

10.32567/hm.2020.4.10 ◽

2020 ◽

Vol 15 (4) ◽

pp. 141-158

Author(s):

Eszter Katalin Bognár

Keyword(s):

Data Mining ◽

Big Data ◽

Data Processing ◽

Relevant Information ◽

Military Operations ◽

Textual Data ◽

Modern Warfare ◽

Tools And Techniques

In modern warfare, the most important innovation to date has been the utilisation of information as a weapon. The basis of successful military operations is the ability to correctly assess a situation based on credible collected information. In today’s military, the primary challenge is not the actual collection of data. It has become more important to extract relevant information from that data. This requirement cannot be successfully completed without necessary improvements in tools and techniques to support the acquisition and analysis of data. This study defines Big Data and its concept as applied to military reconnaissance, focusing on the processing of imagery and textual data, bringing to light modern data processing and analytics methods that enable effective processing.

Download Full-text

Data-Intensive Computing Infrastructure Systems for Unmodified Biological Data Analysis Pipelines

Computational Intelligence Methods for Bioinformatics and Biostatistics - Lecture Notes in Computer Science ◽

10.1007/978-3-319-24462-4_22 ◽

2015 ◽

pp. 259-272 ◽

Cited By ~ 1

Author(s):

Lars Ailo Bongo ◽

Edvard Pedersen ◽

Martin Ernstsen

Keyword(s):

Data Analysis ◽

Biological Data ◽

Data Intensive Computing ◽

Infrastructure Systems ◽

Data Intensive ◽

Biological Data Analysis ◽

Computing Infrastructure

Download Full-text

ReactomeFIViz: the Reactome FI Cytoscape app for pathway and network-based data analysis

F1000Research ◽

10.12688/f1000research.4431.1 ◽

2014 ◽

Vol 3 ◽

pp. 146 ◽

Cited By ~ 2

Author(s):

Guanming Wu ◽

Eric Dawson ◽

Adrian Duong ◽

Robin Haw ◽

Lincoln Stein

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Graphical Models ◽

High Throughput ◽

Interaction Network ◽

Large Data ◽

Relevant Information ◽

Data Sets ◽

Data Types ◽

Biological Studies

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.

Download Full-text

DEXOM: Diversity-based enumeration of optimal context-specific metabolic networks

10.1101/2020.07.17.208918 ◽

2020 ◽

Author(s):

Pablo Rodríguez-Mier ◽

Nathalie Poupin ◽

Carlo de Blasio ◽

Laurent Le Cam ◽

Fabien Jourdan

Keyword(s):

Metabolic Network ◽

Metabolic Networks ◽

Relevant Information ◽

Experimental Information ◽

Correct Identification ◽

Good Representation ◽

Metabolic State ◽

Whole Space ◽

Context Specific ◽

Optimal Networks

AbstractThe correct identification of metabolic activity in tissues or cells under different environmental or genetic conditions can be extremely elusive due to mechanisms such as post-transcriptional modification of enzymes or different rates in protein degradation, making difficult to perform predictions on the basis of gene expression alone. Context-specific metabolic network reconstruction can overcome these limitations by leveraging the integration of multi-omics data into genome-scale metabolic networks (GSMN). Using the experimental information, context-specific models are reconstructed by extracting from the GSMN the sub-network most consistent with the data, subject to biochemical constraints. One advantage is that these context-specific models have more predictive power since they are tailored to the specific organism and condition, containing only the reactions predicted to be active in such context. A major limitation of this approach is that the available information does not generally allow for an unambiguous characterization of the corresponding optimal metabolic sub-network, i.e., there are usually many different sub-network that optimally fit the experimental data. This set of optimal networks represent alternative explanations of the possible metabolic state. Ignoring the set of possible solutions reduces the ability to obtain relevant information about the metabolism and may bias the interpretation of the true metabolic state. In this work, we formalize the problem of enumeration of optimal metabolic networks, we implement a set of techniques that can be used to enumerate optimal networks, and we introduce DEXOM, a novel strategy for diversity-based extraction of optimal metabolic networks. Instead of enumerating the whole space of optimal metabolic networks, which can be computationally intractable, DEXOM samples solutions from the set of optimal metabolic sub-networks maximizing diversity in order to obtain a good representation of the possible metabolic state. We evaluate the solution diversity of the different techniques using simulated and real datasets, and we show how this method can be used to improve in-silico gene essentiality predictions in Saccharomyces Cerevisiae using diversity-based metabolic network ensembles. Both the code and the data used for this research are publicly available on GitHub1.

Download Full-text