Query Optimization Based on Data Provenance

2011 ◽  
Vol 186 ◽  
pp. 586-590 ◽  
Author(s):  
Li Huang ◽  
Hong Bing Cheng

Data Provenance is a key of evaluating authority and uncertainty in data query. Query process technology based on data provenance overcomes the shortcomings of traditional data integration on query quality and efficiency. This paper constructs a data model of heterogeneous data sources provenance, i.e. Semiring Provenance, based on tracing provenance of data origination and evolution. It’s proved to be effective in creating mapping between heterogeneous schemas and optimizing query quality and authority evaluation. Experiments using real data set show that our approach provides an effective and scalable solution for query optimization technology.

2020 ◽  
Vol 10 (1) ◽  
pp. 7
Author(s):  
Miguel R. Luaces ◽  
Jesús A. Fisteus ◽  
Luis Sánchez-Fernández ◽  
Mario Munoz-Organero ◽  
Jesús Balado ◽  
...  

Providing citizens with the ability to move around in an accessible way is a requirement for all cities today. However, modeling city infrastructures so that accessible routes can be computed is a challenge because it involves collecting information from multiple, large-scale and heterogeneous data sources. In this paper, we propose and validate the architecture of an information system that creates an accessibility data model for cities by ingesting data from different types of sources and provides an application that can be used by people with different abilities to compute accessible routes. The article describes the processes that allow building a network of pedestrian infrastructures from the OpenStreetMap information (i.e., sidewalks and pedestrian crossings), improving the network with information extracted obtained from mobile-sensed LiDAR data (i.e., ramps, steps, and pedestrian crossings), detecting obstacles using volunteered information collected from the hardware sensors of the mobile devices of the citizens (i.e., ramps and steps), and detecting accessibility problems with software sensors in social networks (i.e., Twitter). The information system is validated through its application in a case study in the city of Vigo (Spain).


2011 ◽  
Vol 268-270 ◽  
pp. 1868-1873
Author(s):  
Li Jun Yang

The existence of heterogeneous data sources brings great inconvenience to realize the exchange visits to data between different information systems. Therefore, it becomes a meaningful research topic to solve the problem of realizing convenient and flexible exchange visits. This paper combines the data representation format of XML generally used in current network with an interaction technique of WebService, and constructs a UDM data model, which can implement structured data of relational type as well as describe unstructured data and self-describing semi-structured data. So UDM data model can be used as a common data model integrated by heterogeneous data to integrate these heterogeneous data.


Author(s):  
Ivan Bojicic ◽  
Zoran Marjanovic ◽  
Nina Turajlic ◽  
Marko Petrovic ◽  
Milica Vuckovic ◽  
...  

In order for a data warehouse to be able to adequately fulfill its integrative and historical purpose, its data model must enable the appropriate and consistent representation of the different states of a system. In effect, a DW data model, representing the physical structure of the DW, must be general enough, to be able to consume data from heterogeneous data sources and reconcile the semantic differences of the data source models, and, at the same time, be resilient to the constant changes in the structure of the data sources. One of the main problems related to DW development is the absence of a standardized DW data model. In this paper a comparative analysis of the four most prominent DW data models (namely the relational/normalized model, data vault model, anchor model and dimensional model) will be given. On the basis of the results of [1]a, the new DW data model (the Domain/Mapping model- DMM) which would more adequately fulfill the posed requirements is presented.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
A. S. Al-Moisheer

Finite mixture models provide a flexible tool for handling heterogeneous data. This paper introduces a new mixture model which is the mixture of Lindley and lognormal distributions (MLLND). First, the model is formulated, and some of its statistical properties are studied. Next, maximum likelihood estimation of the parameters of the model is considered, and the performance of the estimators of the parameters of the proposed models is evaluated via simulation. Also, the flexibility of the proposed mixture distribution is demonstrated by showing its superiority to fit a well-known real data set of 128 bladder cancer patients compared to several mixture and nonmixture distributions. The Kolmogorov Smirnov test and some information criteria are used to compare the fitted models to the real dataset. Finally, the results are verified using several graphical methods.


2011 ◽  
pp. 277-297 ◽  
Author(s):  
Carlo Combi ◽  
Barbara Oliboni

This chapter describes a graph-based approach to represent information stored in a data warehouse, by means of a temporal semistructured data model. We consider issues related to the representation of semistructured data warehouses, and discuss the set of constraints needed to manage in a correct way the warehouse time, i.e. the time dimension considered storing data in the data warehouse itself. We use a temporal semistructured data model because a data warehouse can contain data coming from different and heterogeneous data sources. This means that data stored in a data warehouse are semistructured in nature, i.e. in different documents the same information can be represented in different ways, and moreover, the document schemata can be available or not. Moreover, information stored into a data warehouse is often time varying, thus as for semistructured data, also in the data warehouse context, it could be useful to consider time.


2020 ◽  
Vol 40 (1-2) ◽  
pp. 87-113
Author(s):  
Otmane Azeroual ◽  
Nico Herbig

The provision, processing and distribution of research information are increasingly supported by the use of research information systems (RIS) at higher education institutions. National and international exchange formats or standards can support the validation and use of research information and increase their informative value and comparability through consistent semantics. The formats are very overlapping and represent different approaches to modeling. This paper presents the data model of the Research Core Dataset (RCD) and discusses its impact on data quality in RIS. Subsequently compares it with the Europe-wide accepted Common European Research Information Format (CERIF) standard to support the implementation of the RCD with CERIF compatibility in the RIS and so that institutions integrate their research information from internal and external heterogeneous data sources to ultimately provide valuable information with high levels of data quality. As these are fundamental to decision-making and knowledge generation as well as the presentation of research.


Sign in / Sign up

Export Citation Format

Share Document