Streamlining geospatial data processing for isotopic landscape modeling

Geographic Information Systems (GIS) are available as stand-alone desktop applications as well as web platforms for vector- and raster-based geospatial data processing and visualization. While each approach offers certain advantages, limitations exist that motivate the development of hybrid systems that will increase the productivity of users for performing interactive data analytics using multidimensional gridded data. Web-based applications are platform-independent, however, require the internet to communicate with servers for data management and processing which raises issues for performance, data integrity, handling, and transfer of massive multidimensional raster data. On the other hand, stand-alone desktop applications can usually function without relying on the internet, however, they are platform-dependent, making distribution and maintenance of these systems difficult. This paper presents RasterJS, a hybrid client-side web library for geospatial data processing that is built on the Progressive Web Application (PWA) architecture to operate seamlessly in both Online and Offline modes. A packaged version of this system is also presented with the help of Web Bundles API for offline access and distribution. RasterJS entails the use of latest web technologies that are supported by modern web browsers, including Service Workers API, Cache API, IndexedDB API, Notifications API, Push API, and Web Workers API, in order to bring geospatial analytics capabilities to large-scale raster data for client-side processing. Each of these technologies acts as a component in the RasterJS to collectively provide a similar experience to users in both Online and Offline modes in terms of performing geospatial analysis activities such as flow direction calculation with hydro-conditioning, raindrop flow tracking, and watershed delineation. A large-scale case study is included in the study for watershed analysis to demonstrate the capabilities and limitations of the library. The framework further presents the potential to be utilized for other use cases that rely on raster processing, including land use, agriculture, soil erosion, transportation, and population studies.

Download Full-text

Serverless Geospatial Data Processing Workflow System Design

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010020 ◽

2021 ◽

Vol 11 (1) ◽

pp. 20

Author(s):

Mete Ercan Pakdil ◽

Rahmi Nurhan Çelik

Keyword(s):

Data Processing ◽

System Design ◽

Application Programming Interface ◽

Geospatial Data ◽

Proof Of Concept ◽

The Public ◽

Workflow System ◽

Application Programming ◽

Open Geospatial Consortium ◽

Programming Interface

Geospatial data and related technologies have become an increasingly important aspect of data analysis processes, with their prominent role in most of them. Serverless paradigm have become the most popular and frequently used technology within cloud computing. This paper reviews the serverless paradigm and examines how it could be leveraged for geospatial data processes by using open standards in the geospatial community. We propose a system design and architecture to handle complex geospatial data processing jobs with minimum human intervention and resource consumption using serverless technologies. In order to define and execute workflows in the system, we also propose new models for both workflow and task definitions models. Moreover, the proposed system has new Open Geospatial Consortium (OGC) Application Programming Interface (API) Processes specification-based web services to provide interoperability with other geospatial applications with the anticipation that it will be more commonly used in the future. We implemented the proposed system on one of the public cloud providers as a proof of concept and evaluated it with sample geospatial workflows and cloud architecture best practices.

Download Full-text

LOGISTIC TASKS SOLUTION ON BASE OF GEOSPATIAL DATA PROCESSING USING MODULE TRACKING ANALYST IN ARCGIS

Vestnik SSUGT (Siberian State University of Geosystems and Technologies) ◽

10.33764/2411-1759-2019-24-1-83-96 ◽

2019 ◽

Vol 24 (1) ◽

pp. 83-96

Author(s):

Andrei A. Basargin ◽

◽

Petr Yu. Bugakov ◽

Stanislav Yu. Katsko ◽

◽

...

Keyword(s):

Data Processing ◽

Geospatial Data

Download Full-text

Data Processing at Scale

10.36227/techrxiv.14445468 ◽

2021 ◽

Author(s):

Raju Singh

Keyword(s):

New York ◽

Data Processing ◽

Hot Spot ◽

Single Point ◽

Geospatial Data ◽

Apache Spark ◽

Data Generation ◽

Transportation Industry ◽

Statistical Parameters ◽

Performance Times

The data generation and collection of data have gone through a series of improvements over the past several years. Now, we observe that both aspects of data (generation and collection) have evolved, it creates another dimension – how to process the data at scale, and how to manage it. Relational DBMS has been a widely accepted idea behind processing and managing data, but it has its own pros and cons, the constraints on data to prevent integrity violation is seen as a trade-off between performance and management. With the advent in the storage, compute and network technology, we have reliably transited the state of relational database management. It’s not yet done. Handling exceptions have been very poor with a single point of failure with traditional DB architecture. However, with distributed systems, it only multiplies the failure points. Failure is expected, and hence the solution for availability is designed around these expected failures. Distributed computing adds functionalities such as performance, availability, and reliability.But, that’s not all. We are living in an era, where we communicate very now and then, through different devices. Not only this, we generate, collect, manage data which are of variant types (mostly unstructured, multi-dimensional, carries lots of noise and bias, etc.). NoSQL DBMS, Apache Spark, and Hadoop come to rescue. One such area that exemplifies the use of big data is the transportation industry, which can encompass shipping, airline data, trucking, and the context we refer to cabs. NYC taxi data is available in an open-dataset that stores, among other things, geospatial data collected from individual taxis as they navigate the streets of New York City. Processing of geospatial data at this scale is very time-consuming and resource-intensive, as anyone who has used ArcGIS on a large dataset can attest. Distributed and parallel data processing presents an opportunity for faster processing of this type of data. The Apache Spark framework is ideal for this task as it is highly efficient with fast performance times. Additionally, it has libraries and APIs built in that allow it to process SQL queries, which many users are likely to be familiar with given its ubiquity. In the following report, we demonstrate our approaches to perform hot spot analysis on the NYC Taxi data. Hot-zone analysis performs range-join on the rectangle and point, to identify the boundaries from where most pickups happen. Hot-cell analysis uses statistical parameters to identify the zones by also considering time as an additional dimension.

Download Full-text