A Survey on Big Data Processing Frameworks for Mobility Analytics

In the current era of big spatial data, the vast amount of produced mobility data (by sensors, GPS-equipped devices, surveillance networks, radars, etc.) poses new challenges related to mobility analytics. A cornerstone facilitator for performing mobility analytics at scale is the availability of big data processing frameworks and techniques tailored for spatial and spatio-temporal data. Motivated by this pressing need, in this paper, we provide a survey of big data processing frameworks for mobility analytics. Particular focus is put on the underlying techniques; indexing, partitioning, query processing are essential for enabling efficient and scalable data management. In this way, this report serves as a useful guide of state-of-the-art methods and modern techniques for scalable mobility data management and analytics.

Download Full-text

Agile Elastic Desktop Corporate Architecture for Big Data

Cybernetics and Information Technologies ◽

10.2478/cait-2020-0025 ◽

2020 ◽

Vol 20 (3) ◽

pp. 15-31

Author(s):

Valentin Kisimov ◽

Dorina Kabakchieva ◽

Aleksandar Naydenov ◽

Kamelia Stefanova

Keyword(s):

Big Data ◽

Data Processing ◽

Business Environment ◽

Digital Transformation ◽

Big Data Processing ◽

Online Business ◽

New Challenges ◽

Corporate Architecture ◽

Effective Use ◽

Corporate Business

AbstractNew challenges in the dynamically changing business environment require companies to experience digital transformation and more effective use of Big Data generated in their expanding online business activities. A possible solution for solving real business problems concerning Big Data resources is proposed in this paper. The defined Agile Elastic Desktop Corporate Architecture for Big Data is based on virtualizing the unused desktop resources and organizing them in order to serve the needs of Big Data processing, thus saving resources needed for additional infrastructure in an organization. The specific corporate business needs are analyzed within the developed R&D environment and, based on that, the unused desktop resources are customized and configured into required Big Data tools. The R&D environment of the proposed Agile Elastic Desktop Corporate Architecture for Big Data could be implemented on the available unused resources of hundreds desktops.

Download Full-text

Spatio‐temporal data management using object lifecycles: A case study of the Australian capital territory spatial data management system

Journal of Spatial Science ◽

10.1080/14498596.2006.9635063 ◽

2006 ◽

Vol 51 (1) ◽

pp. 43-58 ◽

Cited By ~ 4

Author(s):

K.M. Stock

Keyword(s):

Data Management ◽

Spatial Data ◽

Management System ◽

Data Management System ◽

Temporal Data ◽

Australian Capital Territory ◽

Australian Capital ◽

Spatial Data Management ◽

Spatio Temporal

Download Full-text

Trajectory Clustering and k-NN for Robust Privacy Preserving k-NN Query Processing in GeoSpark

Algorithms ◽

10.3390/a13080182 ◽

2020 ◽

Vol 13 (8) ◽

pp. 182

Author(s):

Elias Dritsas ◽

Andreas Kanavos ◽

Maria Trigka ◽

Gerasimos Vonitsanos ◽

Spyros Sioutas ◽

...

Keyword(s):

Big Data ◽

Spatial Data ◽

Privacy Preservation ◽

Nearest Neighbor ◽

Data Representation ◽

Privacy Preserving ◽

Temporal Data ◽

K Nearest Neighbor ◽

Trajectory Data ◽

Spatio Temporal

Privacy Preserving and Anonymity have gained significant concern from the big data perspective. We have the view that the forthcoming frameworks and theories will establish several solutions for privacy protection. The k-anonymity is considered a key solution that has been widely employed to prevent data re-identifcation and concerns us in the context of this work. Data modeling has also gained significant attention from the big data perspective. It is believed that the advancing distributed environments will provide users with several solutions for efficient spatio-temporal data management. GeoSpark will be utilized in the current work as it is a key solution that has been widely employed for spatial data. Specifically, it works on the top of Apache Spark, the main framework leveraged from the research community and organizations for big data transformation, processing and visualization. To this end, we focused on trajectory data representation so as to be applicable to the GeoSpark environment, and a GeoSpark-based approach is designed for the efficient management of real spatio-temporal data. Th next step is to gain deeper understanding of the data through the application of k nearest neighbor (k-NN) queries either using indexing methods or otherwise. The k-anonymity set computation, which is the main component for privacy preservation evaluation and the main issue of our previous works, is evaluated in the GeoSpark environment. More to the point, the focus here is on the time cost of k-anonymity set computation along with vulnerability measurement. The extracted results are presented into tables and figures for visual inspection.

Download Full-text

Big Data Processing and Big Analytics

Advances in Data Mining and Database Management - Emerging Technologies and Applications in Data Processing and Management ◽

10.4018/978-1-5225-8446-9.ch014 ◽

2019 ◽

pp. 285-315

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Daily Lives ◽

Scalable Systems ◽

Diverse Data ◽

Special Category

Development and wide acceptance of data-driven applications in many aspects of our daily lives is generating waste volume of diverse data, which can be collected and analyzed to support various valuable decisions. Management and processing of this big data is a challenge. The development and extensive use of highly distributed and scalable systems to process big data have been widely considered. New data management architectures (e.g., distributed file systems and NoSQL databases) are used in this context. However, features of big data like their complexity and data analytics demands indicate that these concepts solve big data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of big data management systems is considered. In this chapter, the authors discuss these trends and evaluate some current approaches to big data processing and analytics, identify the current challenges, and suggest possible research directions.

Download Full-text

Challenges and Opportunities in Big Data Processing

Big Data ◽

10.4018/978-1-4666-9840-6.ch096 ◽

2016 ◽

pp. 2074-2097 ◽

Cited By ~ 1

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Data Management Systems ◽

Scalable Systems ◽

Challenges And Opportunities ◽

Special Category

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.

Download Full-text

High-Performance Geospatial Big Data Processing System Based on MapReduce

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7100399 ◽

2018 ◽

Vol 7 (10) ◽

pp. 399 ◽

Cited By ~ 10

Author(s):

Junghee Jo ◽

Kang-Woo Lee

Keyword(s):

Big Data ◽

Spatial Analysis ◽

Data Processing ◽

Spatial Data ◽

High Performance ◽

Rapid Development ◽

Processing System ◽

Data Processing System ◽

Seamless Integration ◽

Big Data Processing

With the rapid development of Internet of Things (IoT) technologies, the increasing volume and diversity of sources of geospatial big data have created challenges in storing, managing, and processing data. In addition to the general characteristics of big data, the unique properties of spatial data make the handling of geospatial big data even more complicated. To facilitate users implementing geospatial big data applications in a MapReduce framework, several big data processing systems have extended the original Hadoop to support spatial properties. Most of those platforms, however, have included spatial functionalities by embedding them as a form of plug-in. Although offering a convenient way to add new features to an existing system, the plug-in has several limitations. In particular, while executing spatial and nonspatial operations by alternating between the existing system and the plug-in, additional read and write overheads have to be added to the workflow, significantly reducing performance efficiency. To address this issue, we have developed Marmot, a high-performance, geospatial big data processing system based on MapReduce. Marmot extends Hadoop at a low level to support seamless integration between spatial and nonspatial operations of a solid framework, allowing improved performance of geoprocessing workflow. This paper explains the overall architecture and data model of Marmot as well as the main algorithm for automatic construction of MapReduce jobs from a given spatial analysis task. To illustrate how Marmot transforms a sequence of operators for spatial analysis to map and reduce functions in a way to achieve better performance, this paper presents an example of spatial analysis retrieving the number of subway stations per city in Korea. This paper also experimentally demonstrates that Marmot generally outperforms SpatialHadoop, one of the top plug-in based spatial big data frameworks, particularly in dealing with complex and time-intensive queries involving spatial index.

Download Full-text

An Efficient Log Data Management Architecture for Big Data Processing in Cloud Computing Environments

The Journal of the Institute of Webcasting, Internet and Telecommunication ◽

10.7236/jiibc.2013.13.2.1 ◽

2013 ◽

Vol 13 (2) ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Julie Kim ◽

Hyokyung Bahn

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Data Management ◽

Big Data Processing ◽

Log Data ◽

Management Architecture ◽

Computing Environments

Download Full-text

Big Data Processing Systems: State-of-the-Art and Open Challenges

2015 International Conference on Cloud Computing (ICCC) ◽

10.1109/cloudcomp.2015.7149633 ◽

2015 ◽

Cited By ~ 3

Author(s):

Sherif Sakr ◽

Fuad Bajaber ◽

Ahmed Barnawi ◽

Abdulrahman Altalhi ◽

Radwa Elshawi ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

State Of The Art ◽

Big Data Processing

Download Full-text

Solving Big GIS Projects on Desktop Computers

Kartografija i geoinformacije ◽

10.32909/kg.18.32.4 ◽

2019 ◽

Vol 18 (32) ◽

pp. 44-62

Author(s):

Dalibor Bartoněk

Keyword(s):

Big Data ◽

Data Processing ◽

Spatial Data ◽

Geographical Information Systems ◽

Information Technologies ◽

Digital Data ◽

Geographical Information ◽

Digital Information ◽

Big Data Processing ◽

Data Volume

We are witnessing great developments in digital information technologies. The situation encroaches on spatial data, which contain both attributive and localization features, and this determines their position unequally within an obligatory coordinate system. These changes have resulted in the rapid growth of digital data, significantly supported by technical advances regarding the devices which produce them. As technology for making spatial data advances, methods and software for big data processing are falling behind. Paradoxically, only about 2% of the total volume of data is actually used. Big data processing often requires high computation performance hardware and software. Only a few users possess the appropriate information infrastructure. The proportion of processed data would improve if big data could be processed by ordinary users. In geographical information systems (GIS), these problems arise when solving projects related to extensive territory or considerable secondary complexity, which require big data processing. This paper focuses on the creation and verification of methods by which it would be possible to process effectively extensive projects in GIS supported by desktop hardware and software. It is a project regarding new quick methods for the functional reduction of the data volume, optimization of processing, edge detection in 3D and automated vectorization.

Download Full-text

BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES

Journal of Interconnection Networks ◽

10.1142/s0219265912500090 ◽

2012 ◽

Vol 13 (03n04) ◽

pp. 1250009 ◽

Cited By ~ 14

Author(s):

CHANGQING JI ◽

YU LI ◽

WENMING QIU ◽

YINGWEI JIN ◽

YUJIE XU ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

Data Storage ◽

Large Scale ◽

Distributed Applications ◽

Big Data Processing ◽

Cloud Data ◽

Management Platform ◽

Challenges And Opportunities

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.

Download Full-text