scholarly journals Bridging Data Management and Knowledge Discovery in the Life Sciences

2008 ◽  
Vol 2 (1) ◽  
pp. 28-36 ◽  
Author(s):  
Karl Kugler ◽  
Maria Mercedes Tejada ◽  
Christian Baumgartner ◽  
Bernhard Tilg ◽  
Armin Graber ◽  
...  

In this work we present an application for integrating and analyzing life science data using a biomedical data warehouse system and tools developed in-house enabling knowledge discovery tasks. Knowledge discovery is known as a process where different steps have to be coupled in order to solve a specified question. In order to create such a combination of steps, a data miner using our in-house developed knowledge discovery tool KD3 is able to assemble functional objects to a data mining workflow. The generated workflows can easily be used for ulterior purposes by only adding new data and parameterizing the functional objects in the process. Workflows guide the performance of data integration and aggregation tasks, which were defined and implemented using a public available open source tool. To prove the concept of our application, intelligent query models were designed and tested for the identification of genotype-phenotype correlations in Marfan Syndrome. It could be shown that by using our application, a data miner can easily develop new knowledge discovery algorithms that may later be used to retrieve medical relevant information by clinical researchers.

Author(s):  
William Claster ◽  
Nader Ghotbi ◽  
Subana Shanmuganathan

Some common methodologies in our everyday life are not based on modern scientific knowledge but rather a set of experiences that have established themselves through years of practice. As a good example, there are many forms of alternative medicine, quite popular, however difficult to comprehend by conventional western medicine. The diagnostic and therapeutic methodologies are very different and sometimes unique, compared to that of western medicine. How can we verify and analyze such methodologies through modern scientific methods? We present a case study where data-mining was able to fill this gap and provide us with many tools for investigation. Osteopathy is a popular alternative medicine methodology to treat musculoskeletal complaints in Japan. Using data-mining methodologies, we could overcome some of the analytical problems in an investigation. We studied diagnostic records from a very popular osteopathy clinic in Osaka, Japan that included over 30,000 patient visits over 6 years of practice. The data consists of some careful measurements of tissue electro-conductivity differences at 5 anatomical positions. Data mining and knowledge discovery algorithms were applied to search for meaningful associations within the patient data elements recorded. This study helped us scientifically investigate the diagnostic methodology adopted by the osteopath.


2008 ◽  
pp. 1759-1783
Author(s):  
Christian Baumgartner ◽  
Armin Graber

This chapter provides an overview of the knowledge discovery process in metabolomics, a young discipline in the life sciences arena. It introduces two emerging bioanalytical concepts for generating biomolecular information, followed by various data mining and information retrieval procedures such as feature selection, classification, clustering and biochemical interpretation of mined data, illustrated by real examples from preclinical and clinical studies. The authors trust that this chapter will provide an acceptable balance between bioanalytics background information, essential to understanding the complexity of data generation, and information on data mining principals, specific methods and processes, and biomedical application. Thus, this chapter is anticipated to appeal to those with a metabolomics background as well as to basic researchers within the data mining community who are interested in novel life science applications.


Author(s):  
Manuel Bernal-Llinares ◽  
Javier Ferrer-Gómez ◽  
Nick Juty ◽  
Carole Goble ◽  
Sarala M Wimalaratne ◽  
...  

Abstract Motivation Since its launch in 2010, Identifiers.org has become an important tool for the annotation and cross-referencing of Life Science data. In 2016, we established the Compact Identifier (CID) scheme (prefix: accession) to generate globally unique identifiers for data resources using their locally assigned accession identifiers. Since then, we have developed and improved services to support the growing need to create, reference and resolve CIDs, in systems ranging from human readable text to cloud-based e-infrastructures, by providing high availability and low-latency cloud-based services, backed by a high-quality, manually curated resource. Results We describe a set of services that can be used to construct and resolve CIDs in Life Sciences and beyond. We have developed a new front end for accessing the Identifiers.org registry data and APIs to simplify integration of Identifiers.org CID services with third-party applications. We have also deployed the new Identifiers.org infrastructure in a commercial cloud environment, bringing our services closer to the data. Availabilityand implementation https://identifiers.org.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Zhihui Wang ◽  
Jinyu Wang

The data mining and big data technologies could be of utmost importance to investigate outbound and case datasets in the police records. New findings and useful information may potentially be obtained through data preprocessing and multidimensional modeling. Public security data is a kind of “big data,” having characteristics like large volume, rapid growth, various structures, large-scale storage, low density, and time sensitiveness. In this paper, a police data warehouse is constructed and a public security information analysis system is proposed. The proposed system comprises two modules: (i) case management and (ii) public security information mining. The former is responsible for the collection and processing of case information. The latter preprocesses the data of major cases that have occurred in the past ten years to create a data warehouse. Then, we use the model to create a data warehouse based on needs. By dividing the measurement values and dimensions, the analysis and prediction of criminals’ characteristics and the case environment realize relationships between them. In the process of mining and processing crime data, data mining algorithms can quickly find out the relevant information in the data. Furthermore, the system can find out relevant trends and laws to detect criminal cases faster than other methods. This can reduce the emergence of new crimes and provide a basis for decision-making in the public security department that has practical significance.


2019 ◽  
Author(s):  
Rachel Drysdale ◽  
Charles E. Cook ◽  
Robert Petryszak ◽  
Vivienne Baillie-Gerritsen ◽  
Mary Barlow ◽  
...  

AbstractMotivationLife science research in academia, industry, agriculture, and the health sector depends critically on free and open data resources. ELIXIR (www.elixir-europe.org), the European Research Infrastructure for life sciences data, has identified a set of Core Data Resources within Europe that are of most fundamental importance for the long-term preservation of biological data. We explore characteristics of their usage, impact and assured funding horizon to assess their value and importance as an infrastructure, to understand sustainability of the infrastructure, and to demonstrate a model for assessing Core Data Resources worldwide.ResultsThe nineteen resources currently designated ELIXIR Core Data Resources form a data infrastructure in Europe which is a subset of the worldwide open life science data infrastructure. We show that, from 2014 to 2018, data managed by the Core Data Resources more than tripled while staff numbers increased by less than a tenth. Additionally, support for the Core Data Resources is precarious: together they have assured funding for less than a third of current staff after four years.Our findings demonstrate the importance of the ELIXIR Core Data Resources as repositories for research data and knowledge, while also demonstrating the uncertain nature of the funding environment for this infrastructure. ELIXIR is working towards longer-term support for the Core Data Resources and, through the Global Biodata Coalition, aims to ensure support for the worldwide life science data resource infrastructure of which the ELIXIR Core Data Resources are a [email protected] informationSupplementary data are available at Bioinformatics online.


2007 ◽  
Vol 4 (3) ◽  
pp. 41-51
Author(s):  
Sridhar Hariharaputran ◽  
Thoralf Töpel ◽  
Björn Brockschmidt ◽  
Ralf Hofestädt

Abstract Control of cell proliferation, differentiation, activation and cell removal is crucial for the development and existence of multi-cellular organisms. Apoptosis, or programmed cell death, is a major control mechanism by which cells die and is also important in controlling cell number and proliferation as part of normal development. Molecular networks that regulate these processes are critical targets for drug development, gene therapy, and metabolic engineering. The molecular interactions involved in this and other processes are analyzed and annotated by experts and stored as data in different databases. The key task is to integrate, manage and visualize these data available from different sources and present them in a user-comprehensible manner.Here we present VINEdb, a data warehouse developed to interact with and to explore integrated life science data. Extendable open source data warehouse architecture enables platform-independent usability of the web application and the underlying infrastructure. A high degree of transparency and up-to-dateness is ensured by a monitor component to control and update the data from the sources. Furthermore, the system is supported by a visualization component to allow interactive graphical exploration of the integrated data. We will use apoptotic pathway and caspase-3 as a case study to show capability and usability of our approach. VINEdb is available at http://tunicata.techfak.unibielefeld.de/VINEdb/.


10.1142/6268 ◽  
2006 ◽  
Author(s):  
Stephen Wong ◽  
Chung-Sheng Li

2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Thoralf Töpel ◽  
Benjamin Kormeier ◽  
Andreas Klassen ◽  
Ralf Hofestädt

SummaryThis paper presents a novel bioinformatics data warehouse software kit that integrates biological information from multiple public life science data sources into a local database management system. It stands out from other approaches by providing up-to-date integrated knowledge, platform and database independence as well as high usability and customization. This open source software can be used as a general infrastructure for integrative bioinformatics research and development. The advantages of the approach are realized by using a Java-based system architecture and object-relational mapping (ORM) technology. Finally, a practical application of the system is presented within the emerging area of medical bioinformatics to show the usefulness of the approach.The BioDWH data warehouse software is available for the scientific community at http://sourceforge.net/projects/biodwh/.


2020 ◽  
Author(s):  
David Johnson ◽  
Keeva Cochrane ◽  
Robert P. Davey ◽  
Anthony Etuk ◽  
Alejandra Gonzalez-Beltran ◽  
...  

AbstractBackgroundThe Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open-source community specifications and software tools for enabling discovery, exchange and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab – a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, a JSON serialization ISA-JSON was developed.ResultsIn this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python class objects. We describe the ISA API feature set, early adopters and its growing user community.ConclusionsThe ISA API provides users with rich programmatic metadata handling functionality to support automation, a common interface and an interoperable medium between the two ISA formats, as well as with other life science data formats required for depositing data in public databases.


Sign in / Sign up

Export Citation Format

Share Document