Analysing the Impact of Large Data Imports in OpenStreetMap

OpenStreetMap (OSM) is a global mapping project which generates free geographical information through a community of volunteers. OSM is used in a variety of applications and for research purposes. However, it is also possible to import external data sets to OpenStreetMap. The opinions about these data imports are divergent among researchers and contributors, and the subject is constantly discussed. The question of whether importing data, especially large quantities, is adding value to OSM or compromising the progress of the project needs to be investigated more deeply. For this study, OSM’s historical data were used to compute metrics about the developments of the contributors and OSM data during large data imports which were for the Netherlands and India. Additionally, one time period per study area during which there was no large data import was investigated to compare results. For making statements about the impacts of large data imports in OSM, the metrics were analysed using different techniques (cross-correlation and changepoint detection). It was found that the contributor activity increased during large data imports. Additionally, contributors who were already active before a large import were more likely to contribute to OSM after said import than contributors who made their first contributions during the large data import. The results show the difficulty of interpreting a heterogeneous data source, such as OSM, and the complexity of the project. Limitations and challenges which were encountered are explained, and future directions for continuing in this field of research are given.

Download Full-text

Development of interactive diagnostic tools and metrics for the socio-economic consequences of floods

10.5194/egusphere-egu21-8060 ◽

2021 ◽

Author(s):

Annie-Claude Parent ◽

Frédéric Fournier ◽

François Anctil ◽

Brian Morse ◽

Jean-Philippe Baril-Boyer ◽

...

Keyword(s):

Maximum Flow ◽

Large Data ◽

Heterogeneous Data ◽

Economic Consequences ◽

Disaster Mitigation ◽

Diagnostic Tools ◽

Data Sets ◽

Residential Areas ◽

Level Data ◽

The Impact

Spring floods have generated colossal damages to residential areas in the Province of Quebec, Canada, in 2017 and 2019. Government authorities need accurate modelling of the impact of theoretical floods in order to prioritize pre-disaster mitigation projects to reduce vulnerability. They also need accurate modelling of forecasted floods in order to direct emergency responses.&#160;We present a governmental-academic collaboration that aims at modelling flood impact for both theoretical and forecasted flooding events over all populated river reaches of meridional Quebec. The project, funded by the minist&#232;re de la S&#233;curit&#233; publique du Qu&#233;bec (Quebec ministry in charge of public security), consists in developing a diagnostic tool and methods to assess the risk and impacts of flooding. Tools under development are intended to be used primarily by policy makers.&#160;The project relies on water level data based on the hydrological regimes of nearly 25,000 km of rivers, on high-precision digital terrain models, and on a detailed database of building footprints and characterizations. It also relies on 24h and 48h forecasts of maximum flow for the subject rivers. The developed tools integrate large data sets and heterogeneous data sources and produce insightful metrics on the physical extent and costs of floods and on their impact on the population. The software also provides precise information about each building affected by rising water, including an estimated cost of the damages and impact on inhabitants.&#160;&#160;

Download Full-text

Diagnosing Performance Issues in Microservices with Heterogeneous Data Source

10.1109/ispa-bdcloud-socialcom-sustaincom52081.2021.00074 ◽

2021 ◽

Author(s):

Chuanjia Hou ◽

Tong Jia ◽

Yifan Wu ◽

Ying Li ◽

Jing Han

Keyword(s):

Heterogeneous Data ◽

Heterogeneous Data Source ◽

Data Source ◽

Performance Issues

Download Full-text

Multi-Variable, High Order, Performance Models (2005C)

Fluids Engineering ◽

10.1115/imece2005-79416 ◽

2005 ◽

Cited By ~ 3

Author(s):

David Japikse ◽

Oleg Dubitsky ◽

Kerry N. Oliphant ◽

Robert J. Pelton ◽

Daniel Maynes ◽

...

Keyword(s):

Data Processing ◽

Large Data ◽

High Order ◽

Large Data Sets ◽

Data Sets ◽

Performance Models ◽

Statistical Accuracy ◽

Evaluation Methodologies ◽

New Models ◽

The Impact

In the course of developing advanced data processing and advanced performance models, as presented in companion papers, a number of basic scientific and mathematical questions arose. This paper deals with questions such as uniqueness, convergence, statistical accuracy, training, and evaluation methodologies. The process of bringing together large data sets and utilizing them, with outside data supplementation, is considered in detail. After these questions are focused carefully, emphasis is placed on how the new models, based on highly refined data processing, can best be used in the design world. The impact of this work on designs of the future is discussed. It is expected that this methodology will assist designers to move beyond contemporary design practices.

Download Full-text

Reporting time period matters: quantifying catch rates and exploring recall bias from fisher interviews in Thailand

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/cjfas-2017-0169 ◽

2018 ◽

Vol 75 (12) ◽

pp. 2114-2122 ◽

Cited By ~ 1

Author(s):

Lindsay Aylesworth ◽

Ting-Chun Kuo

Keyword(s):

Fisheries Management ◽

Data Sets ◽

Multiple Time ◽

Recall Bias ◽

External Data ◽

Time Period ◽

The Status ◽

The Mean ◽

Catch Rates ◽

Reported Data

Catch rates reported by fishers are commonly used to understand the status of a fishery, but the reliability of fisher-reported data is affected by how they recall such information. Recalling catch may be influenced by the choice of reporting time period. Using interview data from fishers in Thailand, we investigated (1) how the time period for which fishers report their catch rates (e.g., per day or month) correlates with annual catch estimates and (2) the potential of recall bias when fishers reported multiple catch rates. We found that the annual catch estimates of fishers who reported on a shorter time period (haul, day) were significantly higher than those reported on a longer time period (month, year). This trend held true when individual fishers reported over multiple time periods, suggesting recall bias. By comparing fisher reports with external data sets, we identified that the mean across all reports was most similar to other data sources, rather than any time period. Our research has strong implications in using fishers’ knowledge for fisheries management.

Download Full-text

Data Integration of Heterogeneous Data Source in Multi-parameter Test Processing

Advances in Intelligent and Soft Computing - Software Engineering and Knowledge Engineering: Theory and Practice ◽

10.1007/978-3-642-25349-2_123 ◽

2012 ◽

pp. 929-934

Author(s):

Wang Guitang ◽

Liu Wenjuan ◽

Jiang Yuelong

Keyword(s):

Data Integration ◽

Heterogeneous Data ◽

Heterogeneous Data Source ◽

Data Source ◽

Parameter Test

Download Full-text

An effective wrapper architecture to heterogeneous data source

17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003. ◽

10.1109/aina.2003.1192945 ◽

2003 ◽

Author(s):

Hongzhi Wang ◽

Jianzhong Li ◽

Zhenying He

Keyword(s):

Heterogeneous Data ◽

Heterogeneous Data Source ◽

Data Source

Download Full-text

University Heterogeneous Data Source Integration Middleware Design Based on XML

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.2937 ◽

2014 ◽

Vol 543-547 ◽

pp. 2937-2940

Author(s):

Xiao Xiao Liang ◽

Shun Min Wang ◽

Chong Gang Wei ◽

Chuang Shen

Keyword(s):

Data Integration ◽

Heterogeneous Data ◽

Heterogeneous Data Source ◽

Heterogeneous Data Integration ◽

Data Source Integration ◽

Data Source ◽

The University

According to the distribution, autonomy and heterogeneity of the university database, we designed the structure, main arithmetic, query distribution device, result processor and wrapper of the university heterogeneous data integration middle ware by using Java, XML and middle ware. We emphasized on introducing the designation of query distribution device, result processor and wrapper.

Download Full-text

Integrating Heterogeneous Data Source Using Ontology

Journal of Software ◽

10.4304/jsw.4.8.843-850 ◽

2009 ◽

Vol 4 (8) ◽

Cited By ~ 11

Author(s):

Jinpeng Wang ◽

Jianjiang Lu ◽

Yafei Zhang ◽

Zhuang Miao ◽

Bo Zhou

Keyword(s):

Heterogeneous Data ◽

Heterogeneous Data Source ◽

Data Source

Download Full-text

Visual Communication in Times of Crisis: The Fukushima Nuclear Accident

Leonardo ◽

10.1162/leon_a_00276 ◽

2012 ◽

Vol 45 (2) ◽

pp. 113-118 ◽

Cited By ~ 2

Author(s):

Rama C. Hoetzlein

Keyword(s):

News Media ◽

Visual Communication ◽

Large Data ◽

Nuclear Accident ◽

Large Data Sets ◽

Fukushima Nuclear Accident ◽

Data Sets ◽

On Line ◽

The Impact ◽

Information Aesthetics

This paper follows the development of visual communication through information visualization in the wake of the Fukushima nuclear accident in Japan. While information aesthetics are often applied to large data sets retrospectively, the author developed new works concurrently with an ongoing crisis to examine the impact and social aspects of visual communication while events continued to unfold. The resulting work, Fukushima Nuclear Accident—Radiation Comparison Map, is a reflection of rapidly acquired data, collaborative on-line analysis and reflective criticism of contemporary news media, resolved into a coherent picture through the participation of an on-line community.

Download Full-text

Exploring an Efficient POI Recommendation Model Based on User Characteristics and Spatial-Temporal Factors

Mathematics ◽

10.3390/math9212673 ◽

2021 ◽

Vol 9 (21) ◽

pp. 2673

Author(s):

Chonghuan Xu ◽

Dongsheng Liu ◽

Xinyao Mei

Keyword(s):

Estimation Method ◽

Geographical Location ◽

Heterogeneous Data ◽

User Preferences ◽

Data Sets ◽

User Characteristics ◽

Model Studies ◽

Poi Recommendation ◽

Temporal Factors ◽

The Impact

The advent of mobile scenario-based consumption popularizes and gradually maturates the application of point of interest (POI) recommendation services based on geographical location. However, the insufficient fusion of heterogeneous data in the current POI recommendation services leads to poor recommendation quality. In this paper, we propose a novel hybrid POI recommendation model (NHRM) based on user characteristics and spatial-temporal factors to enhance the recommendation effect. The proposed model contains three sub-models. The first model considers user preferences, forgetting characteristics, user influence, and trajectories. The second model studies the impact of the correlation between the locations of POIs and calculates the check-in probability of POI with the two-dimensional kernel density estimation method. The third model analyzes the influence of category of POI. Consequently, the above results were combined and top-K POIs were recommended to target users. The experimental results on Yelp and Meituan data sets showed that the recommendation performance of our method is superior to some other methods, and the problems of cold-start and data sparsity are alleviated to a certain extent.

Download Full-text