scholarly journals A Topology Based Spatio-Temporal Map Algebra for Big Data Analysis

Data ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 86 ◽  
Author(s):  
Sören Gebbert ◽  
Thomas Leppelt ◽  
Edzer Pebesma

Continental and global datasets based on earth observations or computational models challenge the existing map algebra approaches. The available datasets differ in their spatio-temporal extents and their spatio-temporal granularity, which makes it difficult to process them as time series data in map algebra expressions. To address this issue we introduce a new map algebra approach that is topology based. This topology based map algebra uses spatio-temporal topological operators (STTOP and STTCOP) to specify spatio-temporal operations between topological related map layers of different time-series data. We have implemented several topology based map algebra tools in the open source geoinformation system GRASS GIS and its open source cloud processing engine actinia. We demonstrate the application of our topology based map algebra by solving real world big data problems using a single algebraic expression. This included the massively parallel computation of the NDVI from a series of 100 Sentinel2A scenes organized as earth observation data cubes. The processing was performed and benchmarked on a many core computer setup and in a distributed container environment. The design of our topology based map algebra allows us to deploy it as a standardized service in the EU Horizon 2020 project openEO.

2015 ◽  
Author(s):  
Andrew MacDonald

PhilDB is an open-source time series database. It supports storage of time series datasets that are dynamic, that is recording updates to existing values in a log as they occur. Recent open-source systems, such as InfluxDB and OpenTSDB, have been developed to indefinitely store long-period, high-resolution time series data. Unfortunately they require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a ‘big data’ approach to storage and access. Other open-source projects for handling time series data that don’t take the ‘big data’ approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that changed. Unlike ‘big data’ solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. PhilDB improves accessing datasets by two methods. Firstly, it uses fast reads which make it practical to select data for analysis. Secondly, it uses simple read methods to minimise effort required to extract data. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances as a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. This paper describes the general approach, architecture, and philosophy of the PhilDB software.


2016 ◽  
Vol 2 ◽  
pp. e52 ◽  
Author(s):  
Andrew MacDonald

PhilDB is an open-source time series database that supports storage of time series datasets that are dynamic; that is, it records updates to existing values in a log as they occur. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. It implements fast reads to make it practical to select data for analysis. Recent open-source systems have been developed to indefinitely store long-period high-resolution time series data without change logging. Unfortunately, such systems generally require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a ‘big data’ approach to storage and access. Other open-source projects for handling time series data that avoid the ‘big data’ approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that change. Unlike ‘big data’ solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances when a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. While some existing systems come close to meeting the needs PhilDB addresses, none cover all the needs at once. PhilDB was written to fill this gap in existing solutions. This paper explores existing time series database solutions, discusses the motivation for PhilDB, describes the architecture and philosophy of the PhilDB software, and performs an evaluation between InfluxDB, PhilDB, and SciDB.


2016 ◽  
Author(s):  
Andrew MacDonald

PhilDB is an open-source time series database that supports storage of time series datasets that are dynamic, that is it records updates to existing values in a log as they occur. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. It implements fast reads to make it practical to select data for analysis. Recent open-source systems have been developed to indefinitely store long-period high-resolution time series data without change logging. Unfortunately such systems generally require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a 'big data' approach to storage and access. Other open-source projects for handling time series data that avoid the 'big data' approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that changed. Unlike 'big data' solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances when a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. While some existing systems come close to meeting the needs PhilDB addresses, none cover all the needs at once. PhilDB was written to fill this gap in existing solutions. This paper explores existing time series database solutions, discusses the motivation for PhilDB, describes the architecture and philosophy of the PhilDB software, and performs a simple evaluation between InfluxDB, PhilDB, and SciDB.


2016 ◽  
Author(s):  
Andrew MacDonald

PhilDB is an open-source time series database that supports storage of time series datasets that are dynamic, that is it records updates to existing values in a log as they occur. PhilDB eases loading of data for the user by utilising an intelligent data write method. It preserves existing values during updates and abstracts the update complexity required to achieve logging of data value changes. It implements fast reads to make it practical to select data for analysis. Recent open-source systems have been developed to indefinitely store long-period high-resolution time series data without change logging. Unfortunately such systems generally require a large initial installation investment before use because they are designed to operate over a cluster of servers to achieve high-performance writing of static data in real time. In essence, they have a 'big data' approach to storage and access. Other open-source projects for handling time series data that avoid the 'big data' approach are also relatively new and are complex or incomplete. None of these systems gracefully handle revision of existing data while tracking values that changed. Unlike 'big data' solutions, PhilDB has been designed for single machine deployment on commodity hardware, reducing the barrier to deployment. PhilDB takes a unique approach to meta-data tracking; optional attribute attachment. This facilitates scaling the complexities of storing a wide variety of data. That is, it allows time series data to be loaded as time series instances with minimal initial meta-data, yet additional attributes can be created and attached to differentiate the time series instances when a wider variety of data is needed. PhilDB was written in Python, leveraging existing libraries. While some existing systems come close to meeting the needs PhilDB addresses, none cover all the needs at once. PhilDB was written to fill this gap in existing solutions. This paper explores existing time series database solutions, discusses the motivation for PhilDB, describes the architecture and philosophy of the PhilDB software, and performs a simple evaluation between InfluxDB, PhilDB, and SciDB.


2020 ◽  
Vol 12 (22) ◽  
pp. 3798
Author(s):  
Lei Ma ◽  
Michael Schmitt ◽  
Xiaoxiang Zhu

Recently, time-series from optical satellite data have been frequently used in object-based land-cover classification. This poses a significant challenge to object-based image analysis (OBIA) owing to the presence of complex spatio-temporal information in the time-series data. This study evaluates object-based land-cover classification in the northern suburbs of Munich using time-series from optical Sentinel data. Using a random forest classifier as the backbone, experiments were designed to analyze the impact of the segmentation scale, features (including spectral and temporal features), categories, frequency, and acquisition timing of optical satellite images. Based on our analyses, the following findings are reported: (1) Optical Sentinel images acquired over four seasons can make a significant contribution to the classification of agricultural areas, even though this contribution varies between spectral bands for the same period. (2) The use of time-series data alleviates the issue of identifying the “optimal” segmentation scale. The finding of this study can provide a more comprehensive understanding of the effects of classification uncertainty on object-based dense multi-temporal image classification.


2020 ◽  
Vol 496 (1) ◽  
pp. 629-637
Author(s):  
Ce Yu ◽  
Kun Li ◽  
Shanjiang Tang ◽  
Chao Sun ◽  
Bin Ma ◽  
...  

ABSTRACT Time series data of celestial objects are commonly used to study valuable and unexpected objects such as extrasolar planets and supernova in time domain astronomy. Due to the rapid growth of data volume, traditional manual methods are becoming extremely hard and infeasible for continuously analysing accumulated observation data. To meet such demands, we designed and implemented a special tool named AstroCatR that can efficiently and flexibly reconstruct time series data from large-scale astronomical catalogues. AstroCatR can load original catalogue data from Flexible Image Transport System (FITS) files or data bases, match each item to determine which object it belongs to, and finally produce time series data sets. To support the high-performance parallel processing of large-scale data sets, AstroCatR uses the extract-transform-load (ETL) pre-processing module to create sky zone files and balance the workload. The matching module uses the overlapped indexing method and an in-memory reference table to improve accuracy and performance. The output of AstroCatR can be stored in CSV files or be transformed other into formats as needed. Simultaneously, the module-based software architecture ensures the flexibility and scalability of AstroCatR. We evaluated AstroCatR with actual observation data from The three Antarctic Survey Telescopes (AST3). The experiments demonstrate that AstroCatR can efficiently and flexibly reconstruct all time series data by setting relevant parameters and configuration files. Furthermore, the tool is approximately 3× faster than methods using relational data base management systems at matching massive catalogues.


2011 ◽  
Vol 12 (1) ◽  
pp. 119 ◽  
Author(s):  
Michael Lindner ◽  
Raul Vicente ◽  
Viola Priesemann ◽  
Michael Wibral

Sign in / Sign up

Export Citation Format

Share Document