A Survey on Big Data Processing Frameworks for Mobility Analytics

2021 ◽  
Vol 50 (2) ◽  
pp. 18-29
Author(s):  
Christos Doulkeridis ◽  
Akrivi Vlachou ◽  
Nikos Pelekis ◽  
Yannis Theodoridis

In the current era of big spatial data, the vast amount of produced mobility data (by sensors, GPS-equipped devices, surveillance networks, radars, etc.) poses new challenges related to mobility analytics. A cornerstone facilitator for performing mobility analytics at scale is the availability of big data processing frameworks and techniques tailored for spatial and spatio-temporal data. Motivated by this pressing need, in this paper, we provide a survey of big data processing frameworks for mobility analytics. Particular focus is put on the underlying techniques; indexing, partitioning, query processing are essential for enabling efficient and scalable data management. In this way, this report serves as a useful guide of state-of-the-art methods and modern techniques for scalable mobility data management and analytics.

2020 ◽  
Vol 20 (3) ◽  
pp. 15-31
Author(s):  
Valentin Kisimov ◽  
Dorina Kabakchieva ◽  
Aleksandar Naydenov ◽  
Kamelia Stefanova

AbstractNew challenges in the dynamically changing business environment require companies to experience digital transformation and more effective use of Big Data generated in their expanding online business activities. A possible solution for solving real business problems concerning Big Data resources is proposed in this paper. The defined Agile Elastic Desktop Corporate Architecture for Big Data is based on virtualizing the unused desktop resources and organizing them in order to serve the needs of Big Data processing, thus saving resources needed for additional infrastructure in an organization. The specific corporate business needs are analyzed within the developed R&D environment and, based on that, the unused desktop resources are customized and configured into required Big Data tools. The R&D environment of the proposed Agile Elastic Desktop Corporate Architecture for Big Data could be implemented on the available unused resources of hundreds desktops.


Algorithms ◽  
2020 ◽  
Vol 13 (8) ◽  
pp. 182
Author(s):  
Elias Dritsas ◽  
Andreas Kanavos ◽  
Maria Trigka ◽  
Gerasimos Vonitsanos ◽  
Spyros Sioutas ◽  
...  

Privacy Preserving and Anonymity have gained significant concern from the big data perspective. We have the view that the forthcoming frameworks and theories will establish several solutions for privacy protection. The k-anonymity is considered a key solution that has been widely employed to prevent data re-identifcation and concerns us in the context of this work. Data modeling has also gained significant attention from the big data perspective. It is believed that the advancing distributed environments will provide users with several solutions for efficient spatio-temporal data management. GeoSpark will be utilized in the current work as it is a key solution that has been widely employed for spatial data. Specifically, it works on the top of Apache Spark, the main framework leveraged from the research community and organizations for big data transformation, processing and visualization. To this end, we focused on trajectory data representation so as to be applicable to the GeoSpark environment, and a GeoSpark-based approach is designed for the efficient management of real spatio-temporal data. Th next step is to gain deeper understanding of the data through the application of k nearest neighbor (k-NN) queries either using indexing methods or otherwise. The k-anonymity set computation, which is the main component for privacy preservation evaluation and the main issue of our previous works, is evaluated in the GeoSpark environment. More to the point, the focus here is on the time cost of k-anonymity set computation along with vulnerability measurement. The extracted results are presented into tables and figures for visual inspection.


Author(s):  
Jaroslav Pokorny ◽  
Bela Stantic

Development and wide acceptance of data-driven applications in many aspects of our daily lives is generating waste volume of diverse data, which can be collected and analyzed to support various valuable decisions. Management and processing of this big data is a challenge. The development and extensive use of highly distributed and scalable systems to process big data have been widely considered. New data management architectures (e.g., distributed file systems and NoSQL databases) are used in this context. However, features of big data like their complexity and data analytics demands indicate that these concepts solve big data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of big data management systems is considered. In this chapter, the authors discuss these trends and evaluate some current approaches to big data processing and analytics, identify the current challenges, and suggest possible research directions.


Big Data ◽  
2016 ◽  
pp. 2074-2097 ◽  
Author(s):  
Jaroslav Pokorny ◽  
Bela Stantic

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.


2018 ◽  
Vol 7 (10) ◽  
pp. 399 ◽  
Author(s):  
Junghee Jo ◽  
Kang-Woo Lee

With the rapid development of Internet of Things (IoT) technologies, the increasing volume and diversity of sources of geospatial big data have created challenges in storing, managing, and processing data. In addition to the general characteristics of big data, the unique properties of spatial data make the handling of geospatial big data even more complicated. To facilitate users implementing geospatial big data applications in a MapReduce framework, several big data processing systems have extended the original Hadoop to support spatial properties. Most of those platforms, however, have included spatial functionalities by embedding them as a form of plug-in. Although offering a convenient way to add new features to an existing system, the plug-in has several limitations. In particular, while executing spatial and nonspatial operations by alternating between the existing system and the plug-in, additional read and write overheads have to be added to the workflow, significantly reducing performance efficiency. To address this issue, we have developed Marmot, a high-performance, geospatial big data processing system based on MapReduce. Marmot extends Hadoop at a low level to support seamless integration between spatial and nonspatial operations of a solid framework, allowing improved performance of geoprocessing workflow. This paper explains the overall architecture and data model of Marmot as well as the main algorithm for automatic construction of MapReduce jobs from a given spatial analysis task. To illustrate how Marmot transforms a sequence of operators for spatial analysis to map and reduce functions in a way to achieve better performance, this paper presents an example of spatial analysis retrieving the number of subway stations per city in Korea. This paper also experimentally demonstrates that Marmot generally outperforms SpatialHadoop, one of the top plug-in based spatial big data frameworks, particularly in dealing with complex and time-intensive queries involving spatial index.


Author(s):  
Sherif Sakr ◽  
Fuad Bajaber ◽  
Ahmed Barnawi ◽  
Abdulrahman Altalhi ◽  
Radwa Elshawi ◽  
...  

2019 ◽  
Vol 18 (32) ◽  
pp. 44-62
Author(s):  
Dalibor Bartoněk

We are witnessing great developments in digital information technologies. The situation encroaches on spatial data, which contain both attributive and localization features, and this determines their position unequally within an obligatory coordinate system. These changes have resulted in the rapid growth of digital data, significantly supported by technical advances regarding the devices which produce them. As technology for making spatial data advances, methods and software for big data processing are falling behind. Paradoxically, only about 2% of the total volume of data is actually used. Big data processing often requires high computation performance hardware and software. Only a few users possess the appropriate information infrastructure. The proportion of processed data would improve if big data could be processed by ordinary users. In geographical information systems (GIS), these problems arise when solving projects related to extensive territory or considerable secondary complexity, which require big data processing. This paper focuses on the creation and verification of methods by which it would be possible to process effectively extensive projects in GIS supported by desktop hardware and software. It is a project regarding new quick methods for the functional reduction of the data volume, optimization of processing, edge detection in 3D and automated vectorization.


2012 ◽  
Vol 13 (03n04) ◽  
pp. 1250009 ◽  
Author(s):  
CHANGQING JI ◽  
YU LI ◽  
WENMING QIU ◽  
YINGWEI JIN ◽  
YUJIE XU ◽  
...  

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.


Sign in / Sign up

Export Citation Format

Share Document