Research on Multi-Source Data Integration Based on Ontology and Karma Modeling

The purpose of data integration is that integrates multi-source heterogeneous data. Ontology solves semantic describing of multi-source heterogeneous data. The authors propose a practical approach based on ontology modeling and an information toolkit named Karma modeling for fast data integration, and demonstrate an application example in detail. Armed Conflict Location & Event Data Project (ACLED) is a publicly available conflict event dataset designed for disaggregated conflict analysis and crisis mapping. The authors analyzed the ACLED dataset and domain knowledge to build an Armed Conflict Event ontology, then constructed Karma models to integrate ACLED datasets and publish RDF data. Through SPARQL query to check the correctness of published RDF data. Authors design and developed an ACLED Query System based on Jena API, Canvas JS, and Baidu API, etc. technologies, which provides convenience for governments and researches to analyze regional conflict events and crisis early warning, and it verifies the validity of constructed ontology and the correctness of Karma modeling.

Download Full-text

Data Warehouse Oriented Data Integration System Design and Implementation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.321-324.2532 ◽

2013 ◽

Vol 321-324 ◽

pp. 2532-2538

Author(s):

Xiao Guo Wang ◽

Jian Shen ◽

Chuan Sun

Keyword(s):

Data Integration ◽

Data Warehouse ◽

Heterogeneous Data ◽

Information Process ◽

Efficient Tool ◽

Integration System ◽

Related Information ◽

Source Data ◽

Data Integration System ◽

Database Technology

Considering the difficulty of information collection and integration due to the rapid growth of information, we need an efficient tool to do these jobs. A proposal is be put forward to build a data integration system to collect the source data and preprocess the heterogeneous data and then convert/extract data to the data warehouse. Through experiment and analysis, this paper designed an information process flow and implemented the data integration system, based on B/S framework with the database technology, to deal with the college related information.

Download Full-text

VGEs-Oriented Multi-sourced Heterogeneous Data Integration

Geo-information Science ◽

10.3724/sp.j.1047.2009.00292 ◽

2010 ◽

Vol 11 (3) ◽

pp. 292-298

Author(s):

Hongjun SU ◽

Yehua SHENG ◽

Yongning WEN ◽

Min CHEN

Keyword(s):

Data Integration ◽

Heterogeneous Data ◽

Heterogeneous Data Integration

Download Full-text

Methodology of Big Data Integration from A Priori Unknown Heterogeneous Data Sources

Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence - CSAI '18 ◽

10.1145/3297156.3297249 ◽

2018 ◽

Author(s):

Alexey Samoylov ◽

Nikolay Sergeev ◽

Margarita Kucherova ◽

Boris Denisov

Keyword(s):

Big Data ◽

Data Integration ◽

A Priori ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Git4Voc: Collaborative Vocabulary Development Based on Git

International Journal of Semantic Computing ◽

10.1142/s1793351x16400067 ◽

2016 ◽

Vol 10 (02) ◽

pp. 167-191 ◽

Cited By ~ 7

Author(s):

Lavdim Halilaj ◽

Irlán Grangel-González ◽

Gökhan Coskun ◽

Steffen Lohmann ◽

Sören Auer

Keyword(s):

Control System ◽

Data Integration ◽

Software Development ◽

Domain Knowledge ◽

Vocabulary Development ◽

Version Control ◽

Version Control System

Collaborative vocabulary development in the context of data integration is the process of finding consensus between experts with different backgrounds, system understanding and domain knowledge. The complexity of this process increases with the number of people involved, the variety of the systems to be integrated and the dynamics of their domain. In this paper, we advocate that the usage of a powerful version control system is one of the keys to address this problem. Driven by this idea and the success of the version control system Git in the context of software development, we investigate the applicability of Git for collaborative vocabulary development. Even though vocabulary development and software development have much more similarities than differences, there are still important challenges. These need to be considered in the development of a successful versioning and collaboration system for vocabulary development. Therefore, this paper starts by presenting the challenges we are faced with during the collaborative creation of vocabularies and discusses its distinction to software development. Drawing from these findings, we present Git4Voc which comprises guidelines on how Git can be adopted to vocabulary development. Finally, we demonstrate how Git hooks can be implemented to go beyond the plain functionality of Git by realizing vocabulary-specific features like syntactic validation and semantic diffs.

Download Full-text

MuSA: a graphical user interface for multi-OMICs data integration in radiogenomic studies

Scientific Reports ◽

10.1038/s41598-021-81200-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mario Zanfardino ◽

Rossana Castaldo ◽

Katia Pane ◽

Ornella Affinito ◽

Marco Aiello ◽

...

Keyword(s):

User Interface ◽

Data Integration ◽

Graphical User Interface ◽

Data Science ◽

Heterogeneous Data ◽

Biological Information ◽

Omics Data ◽

Correlation Clustering ◽

Downstream Analysis ◽

Omics Data Integration

AbstractAnalysis of large-scale omics data along with biomedical images has gaining a huge interest in predicting phenotypic conditions towards personalized medicine. Multiple layers of investigations such as genomics, transcriptomics and proteomics, have led to high dimensionality and heterogeneity of data. Multi-omics data integration can provide meaningful contribution to early diagnosis and an accurate estimate of prognosis and treatment in cancer. Some multi-layer data structures have been developed to integrate multi-omics biological information, but none of these has been developed and evaluated to include radiomic data. We proposed to use MultiAssayExperiment (MAE) as an integrated data structure to combine multi-omics data facilitating the exploration of heterogeneous data. We improved the usability of the MAE, developing a Multi-omics Statistical Approaches (MuSA) tool that uses a Shiny graphical user interface, able to simplify the management and the analysis of radiogenomic datasets. The capabilities of MuSA were shown using public breast cancer datasets from TCGA-TCIA databases. MuSA architecture is modular and can be divided in Pre-processing and Downstream analysis. The pre-processing section allows data filtering and normalization. The downstream analysis section contains modules for data science such as correlation, clustering (i.e., heatmap) and feature selection methods. The results are dynamically shown in MuSA. MuSA tool provides an easy-to-use way to create, manage and analyze radiogenomic data. The application is specifically designed to guide no-programmer researchers through different computational steps. Integration analysis is implemented in a modular structure, making MuSA an easily expansible open-source software.

Download Full-text

Development of Knowledge Graph for Data Management Related to Flooding Disasters Using Open Data

Future Internet ◽

10.3390/fi13050124 ◽

2021 ◽

Vol 13 (5) ◽

pp. 124

Author(s):

Jiseong Son ◽

Chul-Su Lim ◽

Hyoung-Seop Shim ◽

Ji-Sun Kang

Keyword(s):

Artificial Intelligence ◽

Domain Knowledge ◽

Open Data ◽

Heterogeneous Data ◽

Big Data Analysis ◽

Knowledge Graph ◽

Cross Domain ◽

Disaster Data ◽

Knowledge Graphs ◽

Open Datasets

Despite the development of various technologies and systems using artificial intelligence (AI) to solve problems related to disasters, difficult challenges are still being encountered. Data are the foundation to solving diverse disaster problems using AI, big data analysis, and so on. Therefore, we must focus on these various data. Disaster data depend on the domain by disaster type and include heterogeneous data and lack interoperability. In particular, in the case of open data related to disasters, there are several issues, where the source and format of data are different because various data are collected by different organizations. Moreover, the vocabularies used for each domain are inconsistent. This study proposes a knowledge graph to resolve the heterogeneity among various disaster data and provide interoperability among domains. Among disaster domains, we describe the knowledge graph for flooding disasters using Korean open datasets and cross-domain knowledge graphs. Furthermore, the proposed knowledge graph is used to assist, solve, and manage disaster problems.

Download Full-text

Reconsideration of in silico siRNA design from a perspective of heterogeneous data integration: problems and solutions

Briefings in Bioinformatics ◽

10.1093/bib/bbs073 ◽

2012 ◽

Vol 15 (2) ◽

pp. 292-305 ◽

Cited By ~ 5

Author(s):

Q. Liu ◽

H. Zhou ◽

R. Zhu ◽

Y. Xu ◽

Z. Cao

Keyword(s):

Data Integration ◽

In Silico ◽

Heterogeneous Data ◽

Heterogeneous Data Integration ◽

Problems And Solutions ◽

Sirna Design ◽

Integration Problems

Download Full-text

Integração, Relacionamento e Representação de Dados em Cidades Inteligentes: Uma Revisão de Literatura

10.5753/wbci.2018.3231 ◽

2018 ◽

Author(s):

Larysse Silva ◽

José Alex Lima ◽

Nélio Cacho ◽

Eiji Adachi ◽

Frederico Lopes ◽

...

Keyword(s):

Decision Making ◽

Literature Review ◽

Data Integration ◽

Smart Cities ◽

Heterogeneous Data ◽

Data Sources ◽

Application Development ◽

Continuous Integration ◽

Heterogeneous Data Sources ◽

Computational Systems

A notable characteristic of smart cities is the increase in the amount of available data generated by several devices and computational systems, thus augmenting the challenges related to the development of software that involves the integration of larges volumes of data. In this context, this paper presents a literature review aimed to identify the main strategies used in the development of solutions for data integration, relationship, and representation in smart cities. This study systematically selected and analyzed eleven studies published from 2015 to 2017. The achieved results reveal gaps regarding solutions for the continuous integration of heterogeneous data sources towards supporting application development and decision-making.

Download Full-text