Parallel Spatial-Data Conversion Engine: Enabling Fast Sharing of Massive Geospatial Data

Large-scale geospatial data have accumulated worldwide in the past decades. However, various data formats often result in a geospatial data sharing problem in the geographical information system community. Despite the various methodologies proposed in the past, geospatial data conversion has always served as a fundamental and efficient way of sharing geospatial data. However, these methodologies are beginning to fail as data increase. This study proposes a parallel spatial data conversion engine (PSCE) with a symmetric mechanism to achieve the efficient sharing of massive geodata by utilizing high-performance computing technology. This engine is designed in an extendable and flexible framework and can customize methods of reading and writing particular spatial data formats. A dynamic task scheduling strategy based on the feature computing index is introduced in the framework to improve load balancing and performance. An experiment is performed to validate the engine framework and performance. In this experiment, geospatial data are stored in the vector spatial data defined in the Chinese Geospatial Data Transfer Format Standard in a parallel file system (Lustre Cluster). Results show that the PSCE has a reliable architecture that can quickly cope with massive spatial datasets.

Download Full-text

HiBuffer: Buffer Analysis of 10-Million-Scale Spatial Data in Real Time

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120467 ◽

2018 ◽

Vol 7 (12) ◽

pp. 467 ◽

Cited By ~ 3

Author(s):

Mengyu Ma ◽

Ye Wu ◽

Wenze Luo ◽

Luo Chen ◽

Jun Li ◽

...

Keyword(s):

Real Time ◽

Spatial Data ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Buffer Analysis ◽

Data Volume ◽

Time Buffer ◽

Real World Datasets ◽

Spatial Indexes

Buffer analysis, a fundamental function in a geographic information system (GIS), identifies areas by the surrounding geographic features within a given distance. Real-time buffer analysis for large-scale spatial data remains a challenging problem since the computational scales of conventional data-oriented methods expand rapidly with increasing data volume. In this paper, we introduce HiBuffer, a visualization-oriented model for real-time buffer analysis. An efficient buffer generation method is proposed which introduces spatial indexes and a corresponding query strategy. Buffer results are organized into a tile-pyramid structure to enable stepless zooming. Moreover, a fully optimized hybrid parallel processing architecture is proposed for the real-time buffer analysis of large-scale spatial data. Experiments using real-world datasets show that our approach can reduce computation time by up to several orders of magnitude while preserving superior visualization effects. Additional experiments were conducted to analyze the influence of spatial data density, buffer radius, and request rate on HiBuffer performance, and the results demonstrate the adaptability and stability of HiBuffer. The parallel scalability of HiBuffer was also tested, showing that HiBuffer achieves high performance of parallel acceleration. Experimental results verify that HiBuffer is capable of handling 10-million-scale data.

Download Full-text

Emotional and Subjective Volunteered Geographical Information

Advances in Geospatial Technologies - Volunteered Geographic Information and the Future of Geospatial Data ◽

10.4018/978-1-5225-2446-5.ch006 ◽

2017 ◽

pp. 97-112 ◽

Cited By ~ 2

Author(s):

Jiri Panek

Keyword(s):

Data Mining ◽

Social Networks ◽

Spatial Data ◽

Large Scale ◽

Geographical Information ◽

Emotional Information ◽

Spatial Features ◽

Spatial Locations ◽

The City

Crowdsroucing of emotional information can take many forms, from social networks data mining to large-scale surveys. The author presents the case-study of emotional mapping in Ostrava´s district Ostrava-Poruba, Czech Republic. Together with the local administration, the author crowdsourced the emotional perceptions of the location from almost 400 citizens, who created 4,051 spatial features. Additional to the spatial data there were 1,244 comments and suggestions for improvements in the district. Furthermore, the author is looking for patterns and hot-spots within the city and if there are any relevant linkages between certain emotions and spatial locations within the city.

Download Full-text

AstroCatR: a mechanism and tool for efficient time series reconstruction of large-scale astronomical catalogues

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa1413 ◽

2020 ◽

Vol 496 (1) ◽

pp. 629-637

Author(s):

Ce Yu ◽

Kun Li ◽

Shanjiang Tang ◽

Chao Sun ◽

Bin Ma ◽

...

Keyword(s):

Time Series ◽

High Performance ◽

Large Scale ◽

Extrasolar Planets ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Observation Data ◽

Data Volume ◽

And Performance

ABSTRACT Time series data of celestial objects are commonly used to study valuable and unexpected objects such as extrasolar planets and supernova in time domain astronomy. Due to the rapid growth of data volume, traditional manual methods are becoming extremely hard and infeasible for continuously analysing accumulated observation data. To meet such demands, we designed and implemented a special tool named AstroCatR that can efficiently and flexibly reconstruct time series data from large-scale astronomical catalogues. AstroCatR can load original catalogue data from Flexible Image Transport System (FITS) files or data bases, match each item to determine which object it belongs to, and finally produce time series data sets. To support the high-performance parallel processing of large-scale data sets, AstroCatR uses the extract-transform-load (ETL) pre-processing module to create sky zone files and balance the workload. The matching module uses the overlapped indexing method and an in-memory reference table to improve accuracy and performance. The output of AstroCatR can be stored in CSV files or be transformed other into formats as needed. Simultaneously, the module-based software architecture ensures the flexibility and scalability of AstroCatR. We evaluated AstroCatR with actual observation data from The three Antarctic Survey Telescopes (AST3). The experiments demonstrate that AstroCatR can efficiently and flexibly reconstruct all time series data by setting relevant parameters and configuration files. Furthermore, the tool is approximately 3× faster than methods using relational data base management systems at matching massive catalogues.

Download Full-text

Massively multi-user online platform for large-scale applications

10.32920/ryerson.14645280 ◽

2021 ◽

Author(s):

Allen Yen-Cheng Yu

Keyword(s):

Grid Computing ◽

High Performance ◽

Large Scale ◽

Distributed Applications ◽

Service Architecture ◽

Online Application ◽

Online Platform ◽

Massively Multiplayer Online Game ◽

And Performance

Many large-scale online applications enable thousands of users to access their services simultaneously. However, the overall service quality of an online application usually degrades when the number of users increases because, traditionally, centralized server architecture does not scale well. In order to provide better Quality of Service (QoS), service architecture such as Grid computing can be used. This type of architecture offers service scalability by utilizing heterogeneous hardware resources. In this thesis, a novel design of Grid computing middleware, Massively Multi-user Online Platform (MMOP), which integrates the Peer-to-Peer (P2P) structured overlays, is proposed. The objectives of this proposed design are to offer scalability and system design flexibility, simplify development processes of distributed applications, and improve QoS by following specified policy rules. A Massively Multiplayer Online Game (MMOG) has been created to validate the functionality and performance of MMOP. The simulation results have demonstrated that MMOP is a high performance and scalable servicing and computing middleware.

Download Full-text

Massively multi-user online platform for large-scale applications

10.32920/ryerson.14645280.v1 ◽

2021 ◽

Author(s):

Allen Yen-Cheng Yu

Keyword(s):

Grid Computing ◽

High Performance ◽

Large Scale ◽

Distributed Applications ◽

Service Architecture ◽

Online Application ◽

Online Platform ◽

Massively Multiplayer Online Game ◽

And Performance

Download Full-text

Peranan Faktor Struktur dalam Model Nilai Tanah: Kajian Kes Bandar Surabaya, Indonesia

Jurnal Teknologi ◽

10.11113/jt.v37.526 ◽

2012 ◽

Author(s):

Hening Widi Oetomo ◽

Ruslan Rainis

Keyword(s):

Information System ◽

Spatial Data ◽

Geographical Information ◽

Structural Factors ◽

Land Value ◽

Landscape Index ◽

The Past ◽

Land Transaction ◽

Value Model ◽

Structural Neighborhood

Model nilai tanah merupakan model untuk menaksir nilai tanah berdasarkan faktor-faktor yang dikenal pasti dapat mempengaruhi nilai tanah. Secara umum ada empat (4) kumpulan faktor yang mempengaruhi nilai tanah iaitu struktur, kejiranan, lokasi dan masa. Sebahagian besar daripada faktor ini memerlukan data ruangan yang dahulunya sukar dijana kerana kekurangan alatan yang sesuai. Perkembangan dalam sistem maklumat geografi (GIS) membolehkan berbagai-bagai jenis analisis ruangan yang diperlukan seperti pengkelasan semula, penindanan, pengukuran jarak dan kehampiran, kejiranan, rangkaian dan permukaan dapat dilakukan dengan lebih mudah. Kajian ini cuba membentuk satu model nilai tanah dengan memfokuskan kepada faktor struktur. Sebanyak empat pemboleh ubah struktur lot tanah diambil kira iaitu keluasan, lebar hadapan, indeks landskap (bentuk lot) dan arah orientasi. Kaedah statistik yang digunakan adalah analisis regresi terhadap sampel 148 transaksi lot tanah. Daripada analisis hanya dua pemboleh ubah iaitu luas dan bentuk lot signifikan pada tahap 0.05 dan berjaya menerangkan sebanyak 74% daripada variasi nilai tanah. Kata kunci: nilai tanah; GIS; faktor struktur A model of land value is a model to estimate land value based on factors identified to influence land value. Generally, there are four groups of factors that influence land value i.e. structural, neighborhood, location and time. Most of these factors need spatial data that were difficult to generate in the past due to the lack of appropriate tools. The recent development in Geographical Information System (GIS) enables the various spatial analysis such as reclassification, overlay, distance measurement and proximity, neighborhood, network and surface analysis to be a carried out with ease. This study attempts to develop a land value model with a particular focus on structural factors. Four structural factors were considered in the model namely size, width of frontage, landscape index (shape of lot) and orientation. The model was developed using multiple regression analysis based on a sample of 148 lots of land transaction. From the analysis, only two variables i.e. size and landscape index were statistically significant at 0.05 level and successfully explained 74% of the variation in land value. Key words: land value; GIS; structural factor

Download Full-text

Urban Spatial Data Computing

Handbook of Research on Big Data Clustering and Machine Learning - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-0106-1.ch019 ◽

2020 ◽

pp. 409-431

Author(s):

Uma V. ◽

Jayanthi Ganapathy

Keyword(s):

Spatial Data ◽

Large Scale ◽

Underground Water ◽

Business Development ◽

Geographical Information ◽

Positioning System ◽

Global Positioning ◽

Urban Settlements ◽

Software Services ◽

Source Of Information

Urban spatial data is the source of information in analysing risks due to natural disaster, evacuation planning, risk mapping and assessments, etc. Global positioning system (GPS) is a satellite-based technology that is used to navigate on earth. Geographical information system (GIS) is a software system that facilitates software services to mankind in various application domains such as agriculture, ecology, forestry, geomorphology analysis in earthquake and landslides, laying of underground water pipe connection and demographic studies like population migration, urban settlements, etc. Thus, spatial and temporal relations of real-time activities can be analysed to predict the future activities like predicting places of interest. Time analysis of such activities helps in personalisation of activities or development of recommendation systems, which could suggest places of interest. Thus, GPS mapping with data analytics using GIS would pave way for commercial and business development in large scale.

Download Full-text

Integrated Cloud Computing Environment for Upstream Geoscience Workflows

10.2118/204848-ms ◽

2021 ◽

Author(s):

Murtadha Al-Habib ◽

Yasser Al-Ghamdi

Keyword(s):

High Performance ◽

Large Scale ◽

End Users ◽

Data Sets ◽

Production Environment ◽

Remote Visualization ◽

Test Environment ◽

Petroleum Resources ◽

Customized Production ◽

And Performance

Abstract Extensive computing resources are required to leverage todays advanced geoscience workflows that are used to explore and characterize giant petroleum resources. In these cases, high-performance workstations are often unable to adequately handle the scale of computing required. The workflows typically utilize complex and massive data sets, which require advanced computing resources to store, process, manage, and visualize various forms of the data throughout the various lifecycles. This work describes a large-scale geoscience end-to-end interpretation platform customized to run on a cluster-based remote visualization environment. A team of computing infrastructure and geoscience workflow experts was established to collaborate on the deployment, which was broken down into separate phases. Initially, an evaluation and analysis phase was conducted to analyze computing requirements and assess potential solutions. A testing environment was then designed, implemented and benchmarked. The third phase used the test environment to determine the scale of infrastructure required for the production environment. Finally, the full-scale customized production environment was deployed for end users. During testing phase, aspects such as connectivity, stability, interactivity, functionality, and performance were investigated using the largest available geoscience datasets. Multiple computing configurations were benchmarked until optimal performance was achieved, under applicable corporate information security guidelines. It was observed that the customized production environment was able to execute workflows that were unable to run on local user workstations. For example, while conducting connectivity, stability and interactivity benchmarking, the test environment was operated for extended periods to ensure stability for workflows that require multiple days to run. To estimate the scale of the required production environment, varying categories of users’ portfolio were determined based on data type, scale and workflow. Continuous monitoring of system resources and utilization enabled continuous improvements to the final solution. The utilization of a fit-for-purpose, customized remote visualization solution may reduce or ultimately eliminate the need to deploy high-end workstations to all end users. Rather, a shared, scalable and reliable cluster-based solution can serve a much larger user community in a highly performant manner.

Download Full-text

Development of framework for aggregation and visualization of three-dimensional (3D) spatial data

10.32920/14638815 ◽

2021 ◽

Author(s):

Mihal Miu ◽

Xiaokun Zhang ◽

M. Ali Akber Dewan ◽

Junye Wang

Keyword(s):

Spatial Data ◽

Large Scale ◽

Spatial Information ◽

Three Dimensional ◽

Geospatial Data ◽

Third Party ◽

Data Sources ◽

Multidimensional Data ◽

Multiple Sources ◽

Geospatial Information

Geospatial information plays an important role in environmental modelling, resource management, business operations, and government policy. However, very little or no commonality between formats of various geospatial data has led to difficulties in utilizing the available geospatial information. These disparate data sources must be aggregated before further extraction and analysis may be performed. The objective of this paper is to develop a framework called PlaniSphere, which aggregates various geospatial datasets, synthesizes raw data, and allows for third party customizations of the software. PlaniSphere uses NASA World Wind to access remote data and map servers using Web Map Service (WMS) as the underlying protocol that supports service-oriented architecture (SOA). The results show that PlaniSphere can aggregate and parses files that reside in local storage and conforms to the following formats: GeoTIFF, ESRI shape files, and KML. Spatial data retrieved using WMS from the Internet can create geospatial data sets (map data) from multiple sources, regardless of who the data providers are. The plug-in function of this framework can be expanded for wider uses, such as aggregating and fusing geospatial data from different data sources, by providing customizations to serve future uses, which the capacity of the commercial ESRI ArcGIS software is limited to add libraries and tools due to its closed-source architectures and proprietary data structures. Analysis and increasing availability of geo-referenced data may provide an effective way to manage spatial information by using large-scale storage, multidimensional data management, and Online Analytical Processing (OLAP) capabilities in one system.

Download Full-text

bíogo: a simple high-performance bioinformatics toolkit for the Go language

10.1101/005033 ◽

2014 ◽

Cited By ~ 6

Author(s):

R Daniel Kortschak ◽

David L Adelson

Keyword(s):

High Performance ◽

Large Scale ◽

Large Data ◽

Biological Data ◽

Data Sets ◽

Barriers To Entry ◽

Data Types ◽

Concurrent Processing ◽

Computationally Intensive ◽

And Performance

bíogo is a framework designed to ease development and maintenance of computationally intensive bioinformatics applications. The library is written in the Go programming language, a garbage-collected, strictly typed compiled language with built in support for concurrent processing, and performance comparable to C and Java. It provides a variety of data types and utility functions to facilitate manipulation and analysis of large scale genomic and other biological data. bíogo uses a concise and expressive syntax, lowering the barriers to entry for researchers needing to process large data sets with custom analyses while retaining computational safety and ease of code review. We believe bíogo provides an excellent environment for training and research in computational biology because of its combination of strict typing, simple and expressive syntax, and high performance.

Download Full-text