scholarly journals A Scalable Big Data Framework for Real-Time Traffic Monitoring System

Author(s):  
Wilfried Yves Hamilton Adoni ◽  
Tarik Nahhal ◽  
Najib Ben Aoun ◽  
Moez Krichen ◽  
Mohammed Alzahrani

Abstract In this paper, we present a scalable and real-time intelligent transportation system based on a big data framework. The proposed system allows for the use of existing data from road sensors to better understand traffic flow, traveler behavior, and increase road network performance. Our transportation system is designed to process large-scale stream data to analyze traffic events such as incidents, crashes and congestion. The experiments performed on the public transportation modes of the city of Casablanca in Morocco reveal that the proposed system achieves a significant gain of time, gathers large-scale data from many road sensors and is not expensive in terms of hardware resource consumption.

Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


Author(s):  
M. Asif Naeem ◽  
Gillian Dobbie ◽  
Gerald Weber

In order to make timely and effective decisions, businesses need the latest information from big data warehouse repositories. To keep these repositories up to date, real-time data integration is required. An important phase in real-time data integration is data transformation where a stream of updates, which is huge in volume and infinite, is joined with large disk-based master data. Stream processing is an important concept in Big Data, since large volumes of data are often best processed immediately. A well-known algorithm called Mesh Join (MESHJOIN) was proposed to process stream data with disk-based master data, which uses limited memory. MESHJOIN is a candidate for a resource-aware system setup. The problem that the authors consider in this chapter is that MESHJOIN is not very selective. In particular, the performance of the algorithm is always inversely proportional to the size of the master data table. As a consequence, the resource consumption is in some scenarios suboptimal. They present an algorithm called Cache Join (CACHEJOIN), which performs asymptotically at least as well as MESHJOIN but performs better in realistic scenarios, particularly if parts of the master data are used with different frequencies. In order to quantify the performance differences, the authors compare both algorithms with a synthetic dataset of a known skewed distribution as well as TPC-H and real-life datasets.


2017 ◽  
Vol 107 ◽  
pp. 418-426 ◽  
Author(s):  
Jiang Zeyu ◽  
Yu Shuiping ◽  
Zhou Mingduan ◽  
Chen Yongqiang ◽  
Liu Yi

Author(s):  
Byron J. Gajewski ◽  
Shawn M. Turner ◽  
William L. Eisele ◽  
Clifford H. Spiegelman

Although most traffic management centers collect intelligent transportation system (ITS) traffic monitoring data from local controllers in 20-s to 30-s intervals, the time intervals for archiving data vary considerably from 1 to 5, 15, or even 60 min. Presented are two statistical techniques that can be used to determine optimal aggregation levels for archiving ITS traffic monitoring data: the cross-validated mean square error and the F-statistic algorithm. Both techniques seek to determine the minimal sufficient statistics necessary to capture the full information contained within a traffic parameter distribution. The statistical techniques were applied to 20-s speed data archived by the TransGuide center in San Antonio, Texas. The optimal aggregation levels obtained by using the two algorithms produced reasonable and intuitive results—both techniques calculated optimal aggregation levels of 60 min or more during periods of low traffic variability. Similarly, both techniques calculated optimal aggregation levels of 1 min or less during periods of high traffic variability (e.g., congestion). A distinction is made between conclusions about the statistical techniques and how the techniques can or should be applied to ITS data archiving. Although the statistical techniques described may not be disputed, there is a wide range of possible aggregation solutions based on these statistical techniques. Ultimately, the aggregation solutions may be driven by nonstatistical parameters such as cost (e.g., “How much do we/the market value the data?”), ease of implementation, system requirements, and other constraints.


Sign in / Sign up

Export Citation Format

Share Document