Measuring conflation success

Revista Cartográfica ◽

10.35424/rcarto.i94.341 ◽

2017 ◽

pp. 41-64

Author(s):

Marta Padilla-Ruiz ◽

Carlos López-Vázquez

Keyword(s):

Big Data ◽

Product Quality ◽

Smart Cities ◽

Spatial Scales ◽

Evaluation Process ◽

Heterogeneous Data ◽

The Other ◽

Data Sources ◽

Data Source ◽

Other Regarding

We are immersed in the Big Data era, where there is a large amount of heterogeneous data, both in time and spatial scales. This data starts to be streamed in real time from different devices and sensors, well illustrated by the new concept of Smart Cities. Conflation processes play an important role in this scenario, defined as the procedure for the combination and integration of different data sources, improving the level of information of the result. It also allows to update geographical databases (GDB), conflating different kind of sources where one of them is more accurate or updated than the other. Regarding geometric conflation, the procedure involves transforming features from one data source to another, minimizing the geometric discrepancies between them. Accuracy has to be taken into account in these processes, and the results need to be measured and evaluated in order to have a better understanding of product quality. In this paper, conflation evaluation process is described along with the different metrics and approaches to assess its accuracy.

Download Full-text

Methodology of Big Data Integration from A Priori Unknown Heterogeneous Data Sources

Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence - CSAI '18 ◽

10.1145/3297156.3297249 ◽

2018 ◽

Author(s):

Alexey Samoylov ◽

Nikolay Sergeev ◽

Margarita Kucherova ◽

Boris Denisov

Keyword(s):

Big Data ◽

Data Integration ◽

A Priori ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Download Full-text

Mapping the United Nations Fundamental Principles of Official Statistics against new and big data sources

Statistical Journal of the IAOS ◽

10.3233/sji-210789 ◽

2021 ◽

Vol 37 (1) ◽

pp. 161-169

Author(s):

Dominik Rozkrut ◽

Olga Świerkot-Strużewska ◽

Gemma Van Halderen

Keyword(s):

Big Data ◽

Public Information ◽

Fundamental Principle ◽

Data Sources ◽

Official Statistics ◽

Development Agenda ◽

Data Gaps ◽

Data Source ◽

Exciting Time ◽

Statistical Systems

Never has there been a more exciting time to be an official statistician. The data revolution is responding to the demands of the CoVID-19 pandemic and a complex sustainable development agenda to improve how data is produced and used, to close data gaps to prevent discrimination, to build capacity and data literacy, to modernize data collection systems and to liberate data to promote transparency and accountability. But can all data be liberated in the production and communication of official statistics? This paper explores the UN Fundamental Principles of Official Statistics in the context of eight new and big data sources. The paper concludes each data source can be used for the production of official statistics in adherence with the Fundamental Principles and argues these data sources should be used if National Statistical Systems are to adhere to the first Fundamental Principle of compiling and making available official statistics that honor citizen’s entitlement to public information.

Download Full-text

Improving Sustainability of Smart Cities through Visualization Techniques for Big Data from IoT Devices

Sustainability ◽

10.3390/su12145595 ◽

2020 ◽

Vol 12 (14) ◽

pp. 5595 ◽

Cited By ~ 3

Author(s):

Ana Lavalle ◽

Miguel A. Teruel ◽

Alejandro Maté ◽

Juan Trujillo

Keyword(s):

Big Data ◽

Smart City ◽

Smart Cities ◽

Fire Department ◽

Heterogeneous Data ◽

Fire Extinguishing ◽

Heterogeneous Data Sources ◽

Iot Devices ◽

Visualization Techniques ◽

The Impact

Fostering sustainability is paramount for Smart Cities development. Lately, Smart Cities are benefiting from the rising of Big Data coming from IoT devices, leading to improvements on monitoring and prevention. However, monitoring and prevention processes require visualization techniques as a key component. Indeed, in order to prevent possible hazards (such as fires, leaks, etc.) and optimize their resources, Smart Cities require adequate visualizations that provide insights to decision makers. Nevertheless, visualization of Big Data has always been a challenging issue, especially when such data are originated in real-time. This problem becomes even bigger in Smart City environments since we have to deal with many different groups of users and multiple heterogeneous data sources. Without a proper visualization methodology, complex dashboards including data from different nature are difficult to understand. In order to tackle this issue, we propose a methodology based on visualization techniques for Big Data, aimed at improving the evidence-gathering process by assisting users in the decision making in the context of Smart Cities. Moreover, in order to assess the impact of our proposal, a case study based on service calls for a fire department is presented. In this sense, our findings will be applied to data coming from citizen calls. Thus, the results of this work will contribute to the optimization of resources, namely fire extinguishing battalions, helping to improve their effectiveness and, as a result, the sustainability of a Smart City, operating better with less resources. Finally, in order to evaluate the impact of our proposal, we have performed an experiment, with non-expert users in data visualization.

Download Full-text

Integração, Relacionamento e Representação de Dados em Cidades Inteligentes: Uma Revisão de Literatura

10.5753/wbci.2018.3231 ◽

2018 ◽

Author(s):

Larysse Silva ◽

José Alex Lima ◽

Nélio Cacho ◽

Eiji Adachi ◽

Frederico Lopes ◽

...

Keyword(s):

Decision Making ◽

Literature Review ◽

Data Integration ◽

Smart Cities ◽

Heterogeneous Data ◽

Data Sources ◽

Application Development ◽

Continuous Integration ◽

Heterogeneous Data Sources ◽

Computational Systems

A notable characteristic of smart cities is the increase in the amount of available data generated by several devices and computational systems, thus augmenting the challenges related to the development of software that involves the integration of larges volumes of data. In this context, this paper presents a literature review aimed to identify the main strategies used in the development of solutions for data integration, relationship, and representation in smart cities. This study systematically selected and analyzed eleven studies published from 2015 to 2017. The achieved results reveal gaps regarding solutions for the continuous integration of heterogeneous data sources towards supporting application development and decision-making.

Download Full-text

Semantic-Based Geospatial Data Integration With Unique Features

Geospatial Intelligence ◽

10.4018/978-1-5225-8054-6.ch012 ◽

2019 ◽

pp. 254-277 ◽

Cited By ~ 1

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Experimental Results ◽

Data Sources ◽

Data Format ◽

Access Protocols ◽

Data Source

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Download Full-text

Semantic Web and Geospatial Unique Features Based Geospatial Data Integration

Geospatial Intelligence ◽

10.4018/978-1-5225-8054-6.ch011 ◽

2019 ◽

pp. 230-253

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Semantic Web ◽

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Data Sources ◽

Modeling Process ◽

Translation Function ◽

Data Source

Since large amount of geospatial data are produced by various sources and stored in incompatible formats, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. First, we provide a uniform integration paradigm for users to retrieve geospatial data. Then, we align the retrieved geospatial data in the modeling process to eliminate heterogeneity with the help of Karma. Our main contribution focuses on addressing the third problem. Previous work has been done by defining a set of semantic rules for performing the linking process. However, the geospatial data has some specific geospatial relationships, which is significant for linking but cannot be solved by the Semantic Web techniques directly. We take advantage of such unique features about geospatial data to implement the linking process. In addition, the previous work will meet a complicated problem when the geospatial data sources are in different languages. In contrast, our proposed linking algorithms are endowed with translation function, which can save the translating cost among all the geospatial sources with different languages. Finally, the geospatial data is integrated by eliminating data redundancy and combining the complementary properties from the linked records. We mainly adopt four kinds of geospatial data sources, namely, OpenStreetMap(OSM), Wikmapia, USGS and EPA, to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Download Full-text

Semantic-Based Geospatial Data Integration With Unique Features

Innovations, Developments, and Applications of Semantic Web and Information Systems - Advances in Web Technologies and Engineering ◽

10.4018/978-1-5225-5042-6.ch015 ◽

2018 ◽

pp. 393-416

Author(s):

Ying Zhang ◽

Chaopeng Li ◽

Na Chen ◽

Shaowen Liu ◽

Liming Du ◽

...

Keyword(s):

Data Integration ◽

High Performance ◽

Data Access ◽

Heterogeneous Data ◽

Geospatial Data ◽

Experimental Results ◽

Data Sources ◽

Data Format ◽

Access Protocols ◽

Data Source

Since large amount of geospatial data are produced by various sources, geospatial data integration is difficult because of the shortage of semantics. Despite standardised data format and data access protocols, such as Web Feature Service (WFS), can enable end-users with access to heterogeneous data stored in different formats from various sources, it is still time-consuming and ineffective due to the lack of semantics. To solve this problem, a prototype to implement the geospatial data integration is proposed by addressing the following four problems, i.e., geospatial data retrieving, modeling, linking and integrating. We mainly adopt four kinds of geospatial data sources to evaluate the performance of the proposed approach. The experimental results illustrate that the proposed linking method can get high performance in generating the matched candidate record pairs in terms of Reduction Ratio(RR), Pairs Completeness(PC), Pairs Quality(PQ) and F-score. The integrating results denote that each data source can get much Complementary Completeness(CC) and Increased Completeness(IC).

Download Full-text

Integrate inconsistent and heterogeneous data based on user feedback

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-04-2014-0013 ◽

2015 ◽

Vol 8 (2) ◽

pp. 187-203 ◽

Cited By ~ 2

Author(s):

Lihua Lu ◽

Hengzhen Zhang ◽

Xiao-Zhi Gao

Keyword(s):

Data Integration ◽

Data Model ◽

Quality Criteria ◽

Heterogeneous Data ◽

Data Sources ◽

Content Type ◽

Inconsistent Data ◽

Data Inconsistency ◽

Multi Attribute Decision Making ◽

Data Source

Purpose – Data integration is to combine data residing at different sources and to provide the users with a unified interface of these data. An important issue on data integration is the existence of conflicts among the different data sources. Data sources may conflict with each other at data level, which is defined as data inconsistency. The purpose of this paper is to aim at this problem and propose a solution for data inconsistency in data integration. Design/methodology/approach – A relational data model extended with data source quality criteria is first defined. Then based on the proposed data model, a data inconsistency solution strategy is provided. To accomplish the strategy, fuzzy multi-attribute decision-making (MADM) approach based on data source quality criteria is applied to obtain the results. Finally, users feedbacks strategies are proposed to optimize the result of fuzzy MADM approach as the final data inconsistent solution. Findings – To evaluate the proposed method, the data obtained from the sensors are extracted. Some experiments are designed and performed to explain the effectiveness of the proposed strategy. The results substantiate that the solution has a better performance than the other methods on correctness, time cost and stability indicators. Practical implications – Since the inconsistent data collected from the sensors are pervasive, the proposed method can solve this problem and correct the wrong choice to some extent. Originality/value – In this paper, for the first time the authors study the effect of users feedbacks on integration results aiming at the inconsistent data.

Download Full-text

Domain/Mapping Model: A Novel Data Warehouse Data Mode

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2017.2.2876 ◽

2017 ◽

Vol 12 (2) ◽

pp. 166 ◽

Cited By ~ 2

Author(s):

Ivan Bojicic ◽

Zoran Marjanovic ◽

Nina Turajlic ◽

Marko Petrovic ◽

Milica Vuckovic ◽

...

Keyword(s):

Data Warehouse ◽

Data Model ◽

Heterogeneous Data ◽

Physical Structure ◽

Data Sources ◽

Mapping Model ◽

Heterogeneous Data Sources ◽

Domain Mapping ◽

Source Models ◽

Data Source

In order for a data warehouse to be able to adequately fulfill its integrative and historical purpose, its data model must enable the appropriate and consistent representation of the different states of a system. In effect, a DW data model, representing the physical structure of the DW, must be general enough, to be able to consume data from heterogeneous data sources and reconcile the semantic differences of the data source models, and, at the same time, be resilient to the constant changes in the structure of the data sources. One of the main problems related to DW development is the absence of a standardized DW data model. In this paper a comparative analysis of the four most prominent DW data models (namely the relational/normalized model, data vault model, anchor model and dimensional model) will be given. On the basis of the results of [1]a, the new DW data model (the Domain/Mapping model- DMM) which would more adequately fulfill the posed requirements is presented.

Download Full-text