Method for Imputing Missing Data using Online Calibration for Urban Freeway Control

Author(s):  
Xu Wang ◽  
Yuechun Ge ◽  
Lei Niu ◽  
Yi He ◽  
Tony Z. Qiu

Real-time traffic control systems are widely implemented on roadways around the world as a measure to improve freeway mobility. However, the systems, which rely on data from road-side and on-road sensors and other electronic equipment, continue to suffer from issues related to missing and erroneous data. While many data imputation methods are documented in the related literature, traffic control systems still lack an imputation method that is applicable in practice, accurate in imputation, and simple in computation. In response, this paper puts forth a linear imputation model that considers both temporal traffic trend and spatial detector correlations. To adapt the model to dynamic traffic variations, the imputation method was equipped with an online calibration module. The proposed imputation method was evaluated with field data from two stations on the Whitemud Drive, a busy urban freeway in Edmonton, Alberta, Canada. The proposed model benefited from its time-of-day temporal trend and outperforms the previous model that considers only spatial correlations. Moreover, the online calibration module was effective in improving imputation accuracy. Finally, the sensitivity of imputation performance was analyzed. The results show that the imputation with online calibration is more sensitive to missing data ratios than that with offline calibration. The sensitivity analysis revealed that imputation with online calibration is more suitable for online imputation in traffic control implementations.

2019 ◽  
Vol 6 (339) ◽  
pp. 73-98
Author(s):  
Małgorzata Aleksandra Misztal

The problem of incomplete data and its implications for drawing valid conclusions from statistical analyses is not related to any particular scientific domain, it arises in economics, sociology, education, behavioural sciences or medicine. Almost all standard statistical methods presume that every object has information on every variable to be included in the analysis and the typical approach to missing data is simply to delete them. However, this leads to ineffective and biased analysis results and is not recommended in the literature. The state of the art technique for handling missing data is multiple imputation. In the paper, some selected multiple imputation methods were taken into account. Special attention was paid to using principal components analysis (PCA) as an imputation method. The goal of the study was to assess the quality of PCA‑based imputations as compared to two other multiple imputation techniques: multivariate imputation by chained equations (MICE) and missForest. The comparison was made by artificially simulating different proportions (10–50%) and mechanisms of missing data using 10 complete data sets from the UCI repository of machine learning databases. Then, missing values were imputed with the use of MICE, missForest and the PCA‑based method (MIPCA). The normalised root mean square error (NRMSE) was calculated as a measure of imputation accuracy. On the basis of the conducted analyses, missForest can be recommended as a multiple imputation method providing the lowest rates of imputation errors for all types of missingness. PCA‑based imputation does not perform well in terms of accuracy.


2021 ◽  
Vol 7 (4) ◽  
pp. 565-583
Author(s):  
A. V. Banite ◽  
◽  
D. S. Deriaga ◽  
O. V. Leonenko ◽  
◽  
...  

The article is devoted to the prospects of improving the quality of traffi c in the junction of the urban street and road network through the introduction of intelligent transport systems, especially automatic traffi c control systems (ATCS). The paper analyzes the problems of implementing intelligent transport systems in urban conditions, taking into account the current regulatory framework. The classifi cation of local automated traffi c control systems according to the adaptability of traffi c light regulation to the changing parameters of traffi c fl ows is given. For the decision of a problem of practicability of introduction of ACSDS, the technique including construction of imitation models for more exact forecasting of eff ect of introduction of local ACSDS on the considered site of an urban street-road network is off ered. The application of the methodology is demonstrated on the example of the intersection of Engels Avenue and Suzdal Avenue in St. Petersburg. Two variants of the organization of control of phases of traffi c light objects are analyzed: static and adapted according to the time of day. The infl uence of ADCS implementation on average speed of vehicles and characteristics of traffi c jams in the junction in question was estimated based on simulation modeling in PTV Vissim. In accordance with the analysis, the prospects of introducing adaptive local ACSDS in the considered transport junction are described


2021 ◽  
Vol 7 ◽  
pp. e619
Author(s):  
Khaled M. Fouad ◽  
Mahmoud M. Ismail ◽  
Ahmad Taher Azar ◽  
Mona M. Arafa

The real-world data analysis and processing using data mining techniques often are facing observations that contain missing values. The main challenge of mining datasets is the existence of missing values. The missing values in a dataset should be imputed using the imputation method to improve the data mining methods’ accuracy and performance. There are existing techniques that use k-nearest neighbors algorithm for imputing the missing values but determining the appropriate k value can be a challenging task. There are other existing imputation techniques that are based on hard clustering algorithms. When records are not well-separated, as in the case of missing data, hard clustering provides a poor description tool in many cases. In general, the imputation depending on similar records is more accurate than the imputation depending on the entire dataset's records. Improving the similarity among records can result in improving the imputation performance. This paper proposes two numerical missing data imputation methods. A hybrid missing data imputation method is initially proposed, called KI, that incorporates k-nearest neighbors and iterative imputation algorithms. The best set of nearest neighbors for each missing record is discovered through the records similarity by using the k-nearest neighbors algorithm (kNN). To improve the similarity, a suitable k value is estimated automatically for the kNN. The iterative imputation method is then used to impute the missing values of the incomplete records by using the global correlation structure among the selected records. An enhanced hybrid missing data imputation method is then proposed, called FCKI, which is an extension of KI. It integrates fuzzy c-means, k-nearest neighbors, and iterative imputation algorithms to impute the missing data in a dataset. The fuzzy c-means algorithm is selected because the records can belong to multiple clusters at the same time. This can lead to further improvement for similarity. FCKI searches a cluster, instead of the whole dataset, to find the best k-nearest neighbors. It applies two levels of similarity to achieve a higher imputation accuracy. The performance of the proposed imputation techniques is assessed by using fifteen datasets with variant missing ratios for three types of missing data; MCAR, MAR, MNAR. These different missing data types are generated in this work. The datasets with different sizes are used in this paper to validate the model. Therefore, proposed imputation techniques are compared with other missing data imputation methods by means of three measures; the root mean square error (RMSE), the normalized root mean square error (NRMSE), and the mean absolute error (MAE). The results show that the proposed methods achieve better imputation accuracy and require significantly less time than other missing data imputation methods.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Author(s):  
Hatem Abou-Senna ◽  
Mohamed El-Agroudy ◽  
Mustapha Mouloua ◽  
Essam Radwan

The use of express lanes (ELs) in freeway traffic management has seen increasing popularity throughout the United States, particularly in Florida. These lanes aim at making the most efficient transportation system management and operations tool to provide a more reliable trip. An important component of ELs is the channelizing devices used to delineate the separation between the ELs and the general-purpose lane. With the upcoming changes to the FHWA Manual on Uniform Traffic Control Devices, this study provided an opportunity to recommend changes affecting safety and efficiency on a nationwide level. It was important to understand the impacts on driver perception and performance in response to the color of the EL delineators. It was also valuable to understand the differences between demographics in responding to delineator colors under different driving conditions. The driving simulator was used to test the responses of several demographic groups to changes in marker color and driving conditions. Furthermore, participants were tested for several factors relevant to driving performance including visual and subjective responses to the changes in colors and driving conditions. Impacts on driver perception were observed via eye-tracking technology with changes to time of day, visibility, traffic density, roadway surface type, and, crucially, color of the delineating devices. The analyses concluded that white was the optimal and most significant color for notice of delineators across the majority of subjective and performance measures, followed by yellow, with black being the least desirable.


2021 ◽  
pp. 147592172110219
Author(s):  
Huachen Jiang ◽  
Chunfeng Wan ◽  
Kang Yang ◽  
Youliang Ding ◽  
Songtao Xue

Wireless sensors are the key components of structural health monitoring systems. During the signal transmission, sensor failure is inevitable, among which, data loss is the most common type. Missing data problem poses a huge challenge to the consequent damage detection and condition assessment, and therefore, great importance should be attached. Conventional missing data imputation basically adopts the correlation-based method, especially for strain monitoring data. However, such methods often require delicate model selection, and the correlations for vehicle-induced strains are much harder to be captured compared with temperature-induced strains. In this article, a novel data-driven generative adversarial network (GAN) for imputing missing strain response is proposed. As opposed to traditional ways where correlations for inter-strains are explicitly modeled, the proposed method directly imputes the missing data considering the spatial–temporal relationships with other strain sensors based on the remaining observed data. Furthermore, the intact and complete dataset is not even necessary during the training process, which shows another great superiority over the model-based imputation method. The proposed method is implemented and verified on a real concrete bridge. In order to demonstrate the applicability and robustness of the GAN, imputation for single and multiple sensors is studied. Results show the proposed method provides an excellent performance of imputation accuracy and efficiency.


Sign in / Sign up

Export Citation Format

Share Document