Bridging Data Management and Knowledge Discovery in the Life Sciences

In this work we present an application for integrating and analyzing life science data using a biomedical data warehouse system and tools developed in-house enabling knowledge discovery tasks. Knowledge discovery is known as a process where different steps have to be coupled in order to solve a specified question. In order to create such a combination of steps, a data miner using our in-house developed knowledge discovery tool KD3 is able to assemble functional objects to a data mining workflow. The generated workflows can easily be used for ulterior purposes by only adding new data and parameterizing the functional objects in the process. Workflows guide the performance of data integration and aggregation tasks, which were defined and implemented using a public available open source tool. To prove the concept of our application, intelligent query models were designed and tested for the identification of genotype-phenotype correlations in Marfan Syndrome. It could be shown that by using our application, a data miner can easily develop new knowledge discovery algorithms that may later be used to retrieve medical relevant information by clinical researchers.

Download Full-text

Data-Mining Techniques for an Analysis Of Non-Conventional Methodologies

Biomedical Knowledge Management ◽

10.4018/978-1-60566-266-4.ch006 ◽

2010 ◽

pp. 82-91

Author(s):

William Claster ◽

Nader Ghotbi ◽

Subana Shanmuganathan

Keyword(s):

Data Mining ◽

Alternative Medicine ◽

Knowledge Discovery ◽

Western Medicine ◽

Musculoskeletal Complaints ◽

Scientific Methods ◽

Using Data ◽

Data Elements ◽

Discovery Algorithms

Some common methodologies in our everyday life are not based on modern scientific knowledge but rather a set of experiences that have established themselves through years of practice. As a good example, there are many forms of alternative medicine, quite popular, however difficult to comprehend by conventional western medicine. The diagnostic and therapeutic methodologies are very different and sometimes unique, compared to that of western medicine. How can we verify and analyze such methodologies through modern scientific methods? We present a case study where data-mining was able to fill this gap and provide us with many tools for investigation. Osteopathy is a popular alternative medicine methodology to treat musculoskeletal complaints in Japan. Using data-mining methodologies, we could overcome some of the analytical problems in an investigation. We studied diagnostic records from a very popular osteopathy clinic in Osaka, Japan that included over 30,000 patient visits over 6 years of practice. The data consists of some careful measurements of tissue electro-conductivity differences at 5 anatomical positions. Data mining and knowledge discovery algorithms were applied to search for meaningful associations within the patient data elements recorded. This study helped us scientifically investigate the diagnostic methodology adopted by the osteopath.

Download Full-text

Data Mining and Knowledge Discovery in Metabolomics Armin

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch104 ◽

2008 ◽

pp. 1759-1783

Author(s):

Christian Baumgartner ◽

Armin Graber

Keyword(s):

Data Mining ◽

Feature Selection ◽

Knowledge Discovery ◽

Biomedical Application ◽

Life Sciences ◽

Background Information ◽

Data Generation ◽

Discovery Process ◽

Mining Community ◽

Preclinical And Clinical Studies

This chapter provides an overview of the knowledge discovery process in metabolomics, a young discipline in the life sciences arena. It introduces two emerging bioanalytical concepts for generating biomolecular information, followed by various data mining and information retrieval procedures such as feature selection, classification, clustering and biochemical interpretation of mined data, illustrated by real examples from preclinical and clinical studies. The authors trust that this chapter will provide an acceptable balance between bioanalytics background information, essential to understanding the complexity of data generation, and information on data mining principals, specific methods and processes, and biomedical application. Thus, this chapter is anticipated to appeal to those with a metabolomics background as well as to basic researchers within the data mining community who are interested in novel life science applications.

Download Full-text

Identifiers.org: Compact Identifier services in the cloud

Bioinformatics ◽

10.1093/bioinformatics/btaa864 ◽

2020 ◽

Author(s):

Manuel Bernal-Llinares ◽

Javier Ferrer-Gómez ◽

Nick Juty ◽

Carole Goble ◽

Sarala M Wimalaratne ◽

...

Keyword(s):

Life Science ◽

Life Sciences ◽

High Availability ◽

Third Party ◽

Low Latency ◽

Cloud Environment ◽

Science Data ◽

High Quality ◽

Front End ◽

Readable Text

Abstract Motivation Since its launch in 2010, Identifiers.org has become an important tool for the annotation and cross-referencing of Life Science data. In 2016, we established the Compact Identifier (CID) scheme (prefix: accession) to generate globally unique identifiers for data resources using their locally assigned accession identifiers. Since then, we have developed and improved services to support the growing need to create, reference and resolve CIDs, in systems ranging from human readable text to cloud-based e-infrastructures, by providing high availability and low-latency cloud-based services, backed by a high-quality, manually curated resource. Results We describe a set of services that can be used to construct and resolve CIDs in Life Sciences and beyond. We have developed a new front end for accessing the Identifiers.org registry data and APIs to simplify integration of Identifiers.org CID services with third-party applications. We have also deployed the new Identifiers.org infrastructure in a commercial cloud environment, bringing our services closer to the data. Availabilityand implementation https://identifiers.org.

Download Full-text

Applications of Machine Learning in Public Security Information and Resource Management

Scientific Programming ◽

10.1155/2021/4734187 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Zhihui Wang ◽

Jinyu Wang

Keyword(s):

Data Mining ◽

Big Data ◽

Data Warehouse ◽

Large Scale ◽

Relevant Information ◽

Practical Significance ◽

Public Security ◽

Data Mining Algorithms ◽

New Findings ◽

Security Information

The data mining and big data technologies could be of utmost importance to investigate outbound and case datasets in the police records. New findings and useful information may potentially be obtained through data preprocessing and multidimensional modeling. Public security data is a kind of “big data,” having characteristics like large volume, rapid growth, various structures, large-scale storage, low density, and time sensitiveness. In this paper, a police data warehouse is constructed and a public security information analysis system is proposed. The proposed system comprises two modules: (i) case management and (ii) public security information mining. The former is responsible for the collection and processing of case information. The latter preprocesses the data of major cases that have occurred in the past ten years to create a data warehouse. Then, we use the model to create a data warehouse based on needs. By dividing the measurement values and dimensions, the analysis and prediction of criminals’ characteristics and the case environment realize relationships between them. In the process of mining and processing crime data, data mining algorithms can quickly find out the relevant information in the data. Furthermore, the system can find out relevant trends and laws to detect criminal cases faster than other methods. This can reduce the emergence of new crimes and provide a basis for decision-making in the public security department that has practical significance.

Download Full-text

The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences

10.1101/598318 ◽

2019 ◽

Cited By ~ 1

Author(s):

Rachel Drysdale ◽

Charles E. Cook ◽

Robert Petryszak ◽

Vivienne Baillie-Gerritsen ◽

Mary Barlow ◽

...

Keyword(s):

Life Science ◽

Life Sciences ◽

Health Sector ◽

Biological Data ◽

Supplementary Information ◽

Science Data ◽

Data Infrastructure ◽

Data Resource ◽

Core Data ◽

The Core

AbstractMotivationLife science research in academia, industry, agriculture, and the health sector depends critically on free and open data resources. ELIXIR (www.elixir-europe.org), the European Research Infrastructure for life sciences data, has identified a set of Core Data Resources within Europe that are of most fundamental importance for the long-term preservation of biological data. We explore characteristics of their usage, impact and assured funding horizon to assess their value and importance as an infrastructure, to understand sustainability of the infrastructure, and to demonstrate a model for assessing Core Data Resources worldwide.ResultsThe nineteen resources currently designated ELIXIR Core Data Resources form a data infrastructure in Europe which is a subset of the worldwide open life science data infrastructure. We show that, from 2014 to 2018, data managed by the Core Data Resources more than tripled while staff numbers increased by less than a tenth. Additionally, support for the Core Data Resources is precarious: together they have assured funding for less than a third of current staff after four years.Our findings demonstrate the importance of the ELIXIR Core Data Resources as repositories for research data and knowledge, while also demonstrating the uncertain nature of the funding environment for this infrastructure. ELIXIR is working towards longer-term support for the Core Data Resources and, through the Global Biodata Coalition, aims to ensure support for the worldwide life science data resource infrastructure of which the ELIXIR Core Data Resources are a [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

DBMap: A Space-Conscious Data Visualization and Knowledge Discovery Framework for Biomedical Data Warehouse

IEEE Transactions on Information Technology in Biomedicine ◽

10.1109/titb.2004.832550 ◽

2004 ◽

Vol 8 (3) ◽

pp. 343-353 ◽

Cited By ~ 4

Author(s):

M. Zhang ◽

H. Zhang ◽

D. Tjandra ◽

S.T.C. Wong

Keyword(s):

Knowledge Discovery ◽

Data Warehouse ◽

Data Visualization ◽

Biomedical Data

Download Full-text

VINEdb: a data warehouse for integration and interactive exploration of life science data

Journal of Integrative Bioinformatics ◽

10.1515/jib-2007-63 ◽

2007 ◽

Vol 4 (3) ◽

pp. 41-51

Author(s):

Sridhar Hariharaputran ◽

Thoralf Töpel ◽

Björn Brockschmidt ◽

Ralf Hofestädt

Keyword(s):

Data Warehouse ◽

Web Application ◽

Life Science ◽

Cell Number ◽

Molecular Networks ◽

Science Data ◽

Apoptotic Pathway ◽

Open Source Data ◽

High Degree

Abstract Control of cell proliferation, differentiation, activation and cell removal is crucial for the development and existence of multi-cellular organisms. Apoptosis, or programmed cell death, is a major control mechanism by which cells die and is also important in controlling cell number and proliferation as part of normal development. Molecular networks that regulate these processes are critical targets for drug development, gene therapy, and metabolic engineering. The molecular interactions involved in this and other processes are analyzed and annotated by experts and stored as data in different databases. The key task is to integrate, manage and visualize these data available from different sources and present them in a user-comprehensible manner.Here we present VINEdb, a data warehouse developed to interact with and to explore integrated life science data. Extendable open source data warehouse architecture enables platform-independent usability of the web application and the underlying infrastructure. A high degree of transparency and up-to-dateness is ensured by a monitor component to control and update the data from the sources. Furthermore, the system is supported by a visualization component to allow interactive graphical exploration of the integrated data. We will use apoptotic pathway and caspase-3 as a case study to show capability and usability of our approach. VINEdb is available at http://tunicata.techfak.unibielefeld.de/VINEdb/.

Download Full-text

Life Science Data Mining

10.1142/6268 ◽

2006 ◽

Author(s):

Stephen Wong ◽

Chung-Sheng Li

Keyword(s):

Data Mining ◽

Life Science ◽

Science Data

Download Full-text

BioDWH: A Data Warehouse Kit for Life Science Data Integration

Journal of Integrative Bioinformatics ◽

10.1515/jib-2008-93 ◽

2008 ◽

Vol 5 (2) ◽

Cited By ~ 9

Author(s):

Thoralf Töpel ◽

Benjamin Kormeier ◽

Andreas Klassen ◽

Ralf Hofestädt

Keyword(s):

Data Warehouse ◽

Open Source Software ◽

Life Science ◽

Database Management System ◽

Biological Information ◽

Practical Application ◽

Science Data ◽

Local Database ◽

Object Relational ◽

Knowledge Platform

SummaryThis paper presents a novel bioinformatics data warehouse software kit that integrates biological information from multiple public life science data sources into a local database management system. It stands out from other approaches by providing up-to-date integrated knowledge, platform and database independence as well as high usability and customization. This open source software can be used as a general infrastructure for integrative bioinformatics research and development. The advantages of the approach are realized by using a Java-based system architecture and object-relational mapping (ORM) technology. Finally, a practical application of the system is presented within the emerging area of medical bioinformatics to show the usefulness of the approach.The BioDWH data warehouse software is available for the scientific community at http://sourceforge.net/projects/biodwh/.

Download Full-text

ISA API: An open platform for interoperable life science experimental metadata

10.1101/2020.11.13.382119 ◽

2020 ◽

Author(s):

David Johnson ◽

Keeva Cochrane ◽

Robert P. Davey ◽

Anthony Etuk ◽

Alejandra Gonzalez-Beltran ◽

...

Keyword(s):

Data Model ◽

Life Science ◽

Life Sciences ◽

Common Data Model ◽

Science Data ◽

Software Suite ◽

Early Adopters ◽

Open Platform ◽

Data Formats ◽

Common Interface

AbstractBackgroundThe Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open-source community specifications and software tools for enabling discovery, exchange and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab – a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, a JSON serialization ISA-JSON was developed.ResultsIn this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python class objects. We describe the ISA API feature set, early adopters and its growing user community.ConclusionsThe ISA API provides users with rich programmatic metadata handling functionality to support automation, a common interface and an interoperable medium between the two ISA formats, as well as with other life science data formats required for depositing data in public databases.

Download Full-text