Query Optimization Based on Data Provenance

Data Provenance is a key of evaluating authority and uncertainty in data query. Query process technology based on data provenance overcomes the shortcomings of traditional data integration on query quality and efficiency. This paper constructs a data model of heterogeneous data sources provenance, i.e. Semiring Provenance, based on tracing provenance of data origination and evolution. It’s proved to be effective in creating mapping between heterogeneous schemas and optimizing query quality and authority evaluation. Experiments using real data set show that our approach provides an effective and scalable solution for query optimization technology.

Download Full-text

Accessible Routes Integrating Data from Multiple Sources

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010007 ◽

2020 ◽

Vol 10 (1) ◽

pp. 7

Author(s):

Miguel R. Luaces ◽

Jesús A. Fisteus ◽

Luis Sánchez-Fernández ◽

Mario Munoz-Organero ◽

Jesús Balado ◽

...

Keyword(s):

Information System ◽

Data Model ◽

Large Scale ◽

Heterogeneous Data ◽

Multiple Sources ◽

Heterogeneous Data Sources ◽

Different Types ◽

Software Sensors ◽

The City

Providing citizens with the ability to move around in an accessible way is a requirement for all cities today. However, modeling city infrastructures so that accessible routes can be computed is a challenge because it involves collecting information from multiple, large-scale and heterogeneous data sources. In this paper, we propose and validate the architecture of an information system that creates an accessibility data model for cities by ingesting data from different types of sources and provides an application that can be used by people with different abilities to compute accessible routes. The article describes the processes that allow building a network of pedestrian infrastructures from the OpenStreetMap information (i.e., sidewalks and pedestrian crossings), improving the network with information extracted obtained from mobile-sensed LiDAR data (i.e., ramps, steps, and pedestrian crossings), detecting obstacles using volunteered information collected from the hardware sensors of the mobile devices of the citizens (i.e., ramps and steps), and detecting accessibility problems with software sensors in social networks (i.e., Twitter). The information system is validated through its application in a case study in the city of Vigo (Spain).

Download Full-text

Study of Data Integration Model Based on Network Technology

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.1868 ◽

2011 ◽

Vol 268-270 ◽

pp. 1868-1873

Author(s):

Li Jun Yang

Keyword(s):

Data Model ◽

Data Representation ◽

Heterogeneous Data ◽

Structured Data ◽

Common Data Model ◽

Network Technology ◽

Interaction Technique ◽

Integration Model ◽

Heterogeneous Data Sources ◽

Meaningful Research

The existence of heterogeneous data sources brings great inconvenience to realize the exchange visits to data between different information systems. Therefore, it becomes a meaningful research topic to solve the problem of realizing convenient and flexible exchange visits. This paper combines the data representation format of XML generally used in current network with an interaction technique of WebService, and constructs a UDM data model, which can implement structured data of relational type as well as describe unstructured data and self-describing semi-structured data. So UDM data model can be used as a common data model integrated by heterogeneous data to integrate these heterogeneous data.

Download Full-text

Domain/Mapping Model: A Novel Data Warehouse Data Mode

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2017.2.2876 ◽

2017 ◽

Vol 12 (2) ◽

pp. 166 ◽

Cited By ~ 2

Author(s):

Ivan Bojicic ◽

Zoran Marjanovic ◽

Nina Turajlic ◽

Marko Petrovic ◽

Milica Vuckovic ◽

...

Keyword(s):

Data Warehouse ◽

Data Model ◽

Heterogeneous Data ◽

Physical Structure ◽

Data Sources ◽

Mapping Model ◽

Heterogeneous Data Sources ◽

Domain Mapping ◽

Source Models ◽

Data Source

In order for a data warehouse to be able to adequately fulfill its integrative and historical purpose, its data model must enable the appropriate and consistent representation of the different states of a system. In effect, a DW data model, representing the physical structure of the DW, must be general enough, to be able to consume data from heterogeneous data sources and reconcile the semantic differences of the data source models, and, at the same time, be resilient to the constant changes in the structure of the data sources. One of the main problems related to DW development is the absence of a standardized DW data model. In this paper a comparative analysis of the four most prominent DW data models (namely the relational/normalized model, data vault model, anchor model and dimensional model) will be given. On the basis of the results of [1]a, the new DW data model (the Domain/Mapping model- DMM) which would more adequately fulfill the posed requirements is presented.

Download Full-text

Semantic Integration and Querying of Heterogeneous Data Sources Using a Hypergraph Data Model

Lecture Notes in Computer Science - Advances in Databases ◽

10.1007/3-540-45495-0_19 ◽

2002 ◽

pp. 166-182 ◽

Cited By ~ 4

Author(s):

Dimitri Theodoratos

Keyword(s):

Data Model ◽

Heterogeneous Data ◽

Semantic Integration ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Detecting common subexpressions for multiple query optimization over loosely-coupled heterogeneous data sources

Distributed and Parallel Databases ◽

10.1007/s10619-014-7166-6 ◽

2014 ◽

Vol 34 (2) ◽

pp. 119-143 ◽

Cited By ~ 6

Author(s):

Mahesh B. Chaudhari ◽

Suzanne W. Dietrich

Keyword(s):

Query Optimization ◽

Heterogeneous Data ◽

Data Sources ◽

Loosely Coupled ◽

Heterogeneous Data Sources ◽

Multiple Query Optimization ◽

Common Subexpressions ◽

Multiple Query

Download Full-text

Mixture of Lindley and Lognormal Distributions: Properties, Estimation, and Application

Journal of Function Spaces ◽

10.1155/2021/9358496 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

A. S. Al-Moisheer

Keyword(s):

Finite Mixture Models ◽

Mixture Distribution ◽

Likelihood Estimation ◽

Real Data ◽

Heterogeneous Data ◽

Information Criteria ◽

Flexible Tool ◽

Data Set ◽

Lognormal Distributions ◽

Kolmogorov Smirnov

Finite mixture models provide a flexible tool for handling heterogeneous data. This paper introduces a new mixture model which is the mixture of Lindley and lognormal distributions (MLLND). First, the model is formulated, and some of its statistical properties are studied. Next, maximum likelihood estimation of the parameters of the model is considered, and the performance of the estimators of the parameters of the proposed models is evaluated via simulation. Also, the flexibility of the proposed mixture distribution is demonstrated by showing its superiority to fit a well-known real data set of 128 bladder cancer patients compared to several mixture and nonmixture distributions. The Kolmogorov Smirnov test and some information criteria are used to compare the fitted models to the real dataset. Finally, the results are verified using several graphical methods.

Download Full-text

Temporal Semistructured Data Models and Data Warehouses

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch012 ◽

2011 ◽

pp. 277-297 ◽

Cited By ~ 2

Author(s):

Carlo Combi ◽

Barbara Oliboni

Keyword(s):

Data Warehouse ◽

Data Model ◽

Heterogeneous Data ◽

Data Models ◽

Semistructured Data ◽

Data Sources ◽

Time Varying ◽

Data Warehouses ◽

Time Dimension ◽

Heterogeneous Data Sources

This chapter describes a graph-based approach to represent information stored in a data warehouse, by means of a temporal semistructured data model. We consider issues related to the representation of semistructured data warehouses, and discuss the set of constraints needed to manage in a correct way the warehouse time, i.e. the time dimension considered storing data in the data warehouse itself. We use a temporal semistructured data model because a data warehouse can contain data coming from different and heterogeneous data sources. This means that data stored in a data warehouse are semistructured in nature, i.e. in different documents the same information can be represented in different ways, and moreover, the document schemata can be available or not. Moreover, information stored into a data warehouse is often time varying, thus as for semistructured data, also in the data warehouse context, it could be useful to consider time.

Download Full-text

A Data Model for Heterogeneous Data Sources

2008 IEEE International Conference on e-Business Engineering ◽

10.1109/icebe.2008.102 ◽

2008 ◽

Cited By ~ 1

Author(s):

Chaiyaporn Chirathamjaree

Keyword(s):

Data Model ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Query optimization under bag and bag-set semantics for multiple heterogeneous data sources

10.12681/eadd/30321 ◽

2011 ◽

Author(s):

Matthew Damigos

Keyword(s):

Query Optimization ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Mapping and semantic interoperability of the German RCD data model with the Europe-wide accepted CERIF

Information Services & Use ◽

10.3233/isu-200076 ◽

2020 ◽

Vol 40 (1-2) ◽

pp. 87-113

Author(s):

Otmane Azeroual ◽

Nico Herbig

Keyword(s):

Data Quality ◽

Data Model ◽

Heterogeneous Data ◽

Knowledge Generation ◽

International Exchange ◽

Research Information ◽

Heterogeneous Data Sources ◽

Core Dataset ◽

Information Format ◽

Use Of Research

The provision, processing and distribution of research information are increasingly supported by the use of research information systems (RIS) at higher education institutions. National and international exchange formats or standards can support the validation and use of research information and increase their informative value and comparability through consistent semantics. The formats are very overlapping and represent different approaches to modeling. This paper presents the data model of the Research Core Dataset (RCD) and discusses its impact on data quality in RIS. Subsequently compares it with the Europe-wide accepted Common European Research Information Format (CERIF) standard to support the implementation of the RCD with CERIF compatibility in the RIS and so that institutions integrate their research information from internal and external heterogeneous data sources to ultimately provide valuable information with high levels of data quality. As these are fundamental to decision-making and knowledge generation as well as the presentation of research.

Download Full-text