scholarly journals Measuring conflation success

2017 ◽  
pp. 41-64
Author(s):  
Marta Padilla-Ruiz ◽  
Carlos López-Vázquez

We are immersed in the Big Data era, where there is a large amount of heterogeneous data, both in time and spatial scales. This data starts to be streamed in real time from different devices and sensors, well illustrated by the new concept of Smart Cities. Conflation processes play an important role in this scenario, defined as the procedure for the combination and integration of different data sources, improving the level of information of the result. It also allows to update geographical databases (GDB), conflating different kind of sources where one of them is more accurate or updated than the other. Regarding geometric conflation, the procedure involves transforming features from one data source to another, minimizing the geometric discrepancies between them. Accuracy has to be taken into account in these processes, and the results need to be measured and evaluated in order to have a better understanding of product quality. In this paper, conflation evaluation process is described along with the different metrics and approaches to assess its accuracy.

2017 ◽  
pp. 41-64
Author(s):  
Marta Padilla-Ruiz ◽  
Carlos López-Vázquez

We are immersed in the Big Data era, where there is a large amount of heterogeneous data, both in time and spatial scales. This data starts to be streamed in real time from different devices and sensors, well illustrated by the new concept of Smart Cities. Conflation processes play an important role in this scenario, defined as the procedure for the combination and integration of different data sources, improving the level of information of the result. It also allows to update geographical databases (GDB), conflating different kind of sources where one of them is more accurate or updated than the other. Regarding geometric conflation, the procedure involves transforming features from one data source to another, minimizing the geometric discrepancies between them. Accuracy has to be taken into account in these processes, and the results need to be measured and evaluated in order to have a better understanding of product quality. In this paper, conflation evaluation process is described along with the different metrics and approaches to assess its accuracy.


2021 ◽  
Vol 37 (1) ◽  
pp. 161-169
Author(s):  
Dominik Rozkrut ◽  
Olga Świerkot-Strużewska ◽  
Gemma Van Halderen

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.


2020 ◽  
Vol 12 (14) ◽  
pp. 5595 ◽  
Author(s):  
Ana Lavalle ◽  
Miguel A. Teruel ◽  
Alejandro Maté ◽  
Juan Trujillo

Fostering sustainability is paramount for Smart Cities development. Lately, Smart Cities are benefiting from the rising of Big Data coming from IoT devices, leading to improvements on monitoring and prevention. However, monitoring and prevention processes require visualization techniques as a key component. Indeed, in order to prevent possible hazards (such as fires, leaks, etc.) and optimize their resources, Smart Cities require adequate visualizations that provide insights to decision makers. Nevertheless, visualization of Big Data has always been a challenging issue, especially when such data are originated in real-time. This problem becomes even bigger in Smart City environments since we have to deal with many different groups of users and multiple heterogeneous data sources. Without a proper visualization methodology, complex dashboards including data from different nature are difficult to understand. In order to tackle this issue, we propose a methodology based on visualization techniques for Big Data, aimed at improving the evidence-gathering process by assisting users in the decision making in the context of Smart Cities. Moreover, in order to assess the impact of our proposal, a case study based on service calls for a fire department is presented. In this sense, our findings will be applied to data coming from citizen calls. Thus, the results of this work will contribute to the optimization of resources, namely fire extinguishing battalions, helping to improve their effectiveness and, as a result, the sustainability of a Smart City, operating better with less resources. Finally, in order to evaluate the impact of our proposal, we have performed an experiment, with non-expert users in data visualization.


2018 ◽  
Author(s):  
Larysse Silva ◽  
José Alex Lima ◽  
Nélio Cacho ◽  
Eiji Adachi ◽  
Frederico Lopes ◽  
...  

A notable characteristic of smart cities is the increase in the amount of available data generated by several devices and computational systems, thus augmenting the challenges related to the development of software that involves the integration of larges volumes of data. In this context, this paper presents a literature review aimed to identify the main strategies used in the development of solutions for data integration, relationship, and representation in smart cities. This study systematically selected and analyzed eleven studies published from 2015 to 2017. The achieved results reveal gaps regarding solutions for the continuous integration of heterogeneous data sources towards supporting application development and decision-making.


2019 ◽  
pp. 254-277 ◽  
Author(s):  
Ying Zhang ◽  
Chaopeng Li ◽  
Na Chen ◽  
Shaowen Liu ◽  
Liming Du ◽  
...  

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).


2019 ◽  
pp. 230-253
Author(s):  
Ying Zhang ◽  
Chaopeng Li ◽  
Na Chen ◽  
Shaowen Liu ◽  
Liming Du ◽  
...  

Since large amount of geospatial data are produced by various sources and stored in incompatible formats, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. First, we provide a uniform integration paradigm for users to retrieve geospatial data. Then, we align the retrieved geospatial data in the modeling process to eliminate heterogeneity with the help of Karma. Our main contribution focuses on addressing the third problem. Previous work has been done by defining a set of semantic rules for performing the linking process. However, the geospatial data has some specific geospatial relationships, which is significant for linking but cannot be solved by the Semantic Web techniques directly. We take advantage of such unique features about geospatial data to implement the linking process. In addition, the previous work will meet a complicated problem when the geospatial data sources are in different languages. In contrast, our proposed linking algorithms are endowed with translation function, which can save the translating cost among all the geospatial sources with different languages. Finally, the geospatial data is integrated by eliminating data redundancy and combining the complementary properties from the linked records. We mainly adopt four kinds of geospatial data sources, namely, OpenStreetMap(OSM), Wikmapia, USGS and EPA, to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).


Author(s):  
Ying Zhang ◽  
Chaopeng Li ◽  
Na Chen ◽  
Shaowen Liu ◽  
Liming Du ◽  
...  

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).


Author(s):  
Lihua Lu ◽  
Hengzhen Zhang ◽  
Xiao-Zhi Gao

Purpose – Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data. An important issue on data integration is the existence of conflicts among the different data sources. Data sources may conflict with each other at data level, which is defined as data inconsistency. The purpose of this paper is to aim at this problem and propose a solution for data inconsistency in data integration. Design/methodology/approach – A relational data model extended with data source quality criteria is first defined. Then based on the proposed data model, a data inconsistency solution strategy is provided. To accomplish the strategy, fuzzy multi-attribute decision-making (MADM) approach based on data source quality criteria is applied to obtain the results. Finally, users feedbacks strategies are proposed to optimize the result of fuzzy MADM approach as the final data inconsistent solution. Findings – To evaluate the proposed method, the data obtained from the sensors are extracted. Some experiments are designed and performed to explain the effectiveness of the proposed strategy. The results substantiate that the solution has a better performance than the other methods on correctness, time cost and stability indicators. Practical implications – Since the inconsistent data collected from the sensors are pervasive, the proposed method can solve this problem and correct the wrong choice to some extent. Originality/value – In this paper, for the first time the authors study the effect of users feedbacks on integration results aiming at the inconsistent data.


Author(s):  
Ivan Bojicic ◽  
Zoran Marjanovic ◽  
Nina Turajlic ◽  
Marko Petrovic ◽  
Milica Vuckovic ◽  
...  

In order for a data warehouse to be able to adequately fulfill its integrative and historical purpose, its data model must enable the appropriate and consistent representation of the different states of a system. In effect, a DW data model, representing the physical structure of the DW, must be general enough, to be able to consume data from heterogeneous data sources and reconcile the semantic differences of the data source models, and, at the same time, be resilient to the constant changes in the structure of the data sources. One of the main problems related to DW development is the absence of a standardized DW data model. In this paper a comparative analysis of the four most prominent DW data models (namely the relational/normalized model, data vault model, anchor model and dimensional model) will be given. On the basis of the results of [1]a, the new DW data model (the Domain/Mapping model- DMM) which would more adequately fulfill the posed requirements is presented.


Sign in / Sign up

Export Citation Format

Share Document