Research on Multi-Source Data Integration Based on Ontology and Karma Modeling

2019 ◽  
Vol 15 (2) ◽  
pp. 69-87 ◽  
Author(s):  
Hongyan Yun ◽  
Ying He ◽  
Li Lin ◽  
Xiaohong Wang

The purpose of data integration is that integrates multi-source heterogeneous data. Ontology solves semantic describing of multi-source heterogeneous data. The authors propose a practical approach based on ontology modeling and an information toolkit named Karma modeling for fast data integration, and demonstrate an application example in detail. Armed Conflict Location & Event Data Project (ACLED) is a publicly available conflict event dataset designed for disaggregated conflict analysis and crisis mapping. The authors analyzed the ACLED dataset and domain knowledge to build an Armed Conflict Event ontology, then constructed Karma models to integrate ACLED datasets and publish RDF data. Through SPARQL query to check the correctness of published RDF data. Authors design and developed an ACLED Query System based on Jena API, Canvas JS, and Baidu API, etc. technologies, which provides convenience for governments and researches to analyze regional conflict events and crisis early warning, and it verifies the validity of constructed ontology and the correctness of Karma modeling.

2013 ◽  
Vol 321-324 ◽  
pp. 2532-2538
Author(s):  
Xiao Guo Wang ◽  
Jian Shen ◽  
Chuan Sun

Considering the difficulty of information collection and integration due to the rapid growth of information, we need an efficient tool to do these jobs. A proposal is be put forward to build a data integration system to collect the source data and preprocess the heterogeneous data and then convert/extract data to the data warehouse. Through experiment and analysis, this paper designed an information process flow and implemented the data integration system, based on B/S framework with the database technology, to deal with the college related information.


2010 ◽  
Vol 11 (3) ◽  
pp. 292-298
Author(s):  
Hongjun SU ◽  
Yehua SHENG ◽  
Yongning WEN ◽  
Min CHEN

2016 ◽  
Vol 10 (02) ◽  
pp. 167-191 ◽  
Author(s):  
Lavdim Halilaj ◽  
Irlán Grangel-González ◽  
Gökhan Coskun ◽  
Steffen Lohmann ◽  
Sören Auer

Collaborative vocabulary development in the context of data integration is the process of finding consensus between experts with different backgrounds, system understanding and domain knowledge. The complexity of this process increases with the number of people involved, the variety of the systems to be integrated and the dynamics of their domain. In this paper, we advocate that the usage of a powerful version control system is one of the keys to address this problem. Driven by this idea and the success of the version control system Git in the context of software development, we investigate the applicability of Git for collaborative vocabulary development. Even though vocabulary development and software development have much more similarities than differences, there are still important challenges. These need to be considered in the development of a successful versioning and collaboration system for vocabulary development. Therefore, this paper starts by presenting the challenges we are faced with during the collaborative creation of vocabularies and discusses its distinction to software development. Drawing from these findings, we present Git4Voc which comprises guidelines on how Git can be adopted to vocabulary development. Finally, we demonstrate how Git hooks can be implemented to go beyond the plain functionality of Git by realizing vocabulary-specific features like syntactic validation and semantic diffs.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mario Zanfardino ◽  
Rossana Castaldo ◽  
Katia Pane ◽  
Ornella Affinito ◽  
Marco Aiello ◽  
...  

AbstractAnalysis of large-scale omics data along with biomedical images has gaining a huge interest in predicting phenotypic conditions towards personalized medicine. Multiple layers of investigations such as genomics, transcriptomics and proteomics, have led to high dimensionality and heterogeneity of data. Multi-omics data integration can provide meaningful contribution to early diagnosis and an accurate estimate of prognosis and treatment in cancer. Some multi-layer data structures have been developed to integrate multi-omics biological information, but none of these has been developed and evaluated to include radiomic data. We proposed to use MultiAssayExperiment (MAE) as an integrated data structure to combine multi-omics data facilitating the exploration of heterogeneous data. We improved the usability of the MAE, developing a Multi-omics Statistical Approaches (MuSA) tool that uses a Shiny graphical user interface, able to simplify the management and the analysis of radiogenomic datasets. The capabilities of MuSA were shown using public breast cancer datasets from TCGA-TCIA databases. MuSA architecture is modular and can be divided in Pre-processing and Downstream analysis. The pre-processing section allows data filtering and normalization. The downstream analysis section contains modules for data science such as correlation, clustering (i.e., heatmap) and feature selection methods. The results are dynamically shown in MuSA. MuSA tool provides an easy-to-use way to create, manage and analyze radiogenomic data. The application is specifically designed to guide no-programmer researchers through different computational steps. Integration analysis is implemented in a modular structure, making MuSA an easily expansible open-source software.


2021 ◽  
Vol 13 (5) ◽  
pp. 124
Author(s):  
Jiseong Son ◽  
Chul-Su Lim ◽  
Hyoung-Seop Shim ◽  
Ji-Sun Kang

Despite the development of various technologies and systems using artificial intelligence (AI) to solve problems related to disasters, difficult challenges are still being encountered. Data are the foundation to solving diverse disaster problems using AI, big data analysis, and so on. Therefore, we must focus on these various data. Disaster data depend on the domain by disaster type and include heterogeneous data and lack interoperability. In particular, in the case of open data related to disasters, there are several issues, where the source and format of data are different because various data are collected by different organizations. Moreover, the vocabularies used for each domain are inconsistent. This study proposes a knowledge graph to resolve the heterogeneity among various disaster data and provide interoperability among domains. Among disaster domains, we describe the knowledge graph for flooding disasters using Korean open datasets and cross-domain knowledge graphs. Furthermore, the proposed knowledge graph is used to assist, solve, and manage disaster problems.


2018 ◽  
Author(s):  
Larysse Silva ◽  
José Alex Lima ◽  
Nélio Cacho ◽  
Eiji Adachi ◽  
Frederico Lopes ◽  
...  

A notable characteristic of smart cities is the increase in the amount of available data generated by several devices and computational systems, thus augmenting the challenges related to the development of software that involves the integration of larges volumes of data. In this context, this paper presents a literature review aimed to identify the main strategies used in the development of solutions for data integration, relationship, and representation in smart cities. This study systematically selected and analyzed eleven studies published from 2015 to 2017. The achieved results reveal gaps regarding solutions for the continuous integration of heterogeneous data sources towards supporting application development and decision-making.


Sign in / Sign up

Export Citation Format

Share Document