Daily Traffic Count Imputation for Bicycle and Pedestrian Traffic: Comparing Existing Methods with Machine Learning Approaches

Author(s):  
Josh Roll

Monitoring nonmotorized traffic is becoming increasingly common practice at local and state departments of transportation. These travel activity data are necessary to monitor the system and track progress toward active transportation policy and program goals. A common problem is that permanent count site data are often missing, making those sites less useful. Being able to accurately estimate those missing data records functionally increases the amount of data available to use by themselves as metrics for monitoring traffic but also makes available more data for factoring short-term sites. Using nonmotorized traffic counts from several cities in Oregon, this research compared the ability of day-of-year (DOY) factors, a statistical model, and machine learning algorithms to accurately impute daily traffic records for annual traffic estimation. Based on exhaustive cross-validation experiments using data not missing at random scenarios, this research concluded that random forest and DOY factor approaches could be used to impute daily counts for nonmotorized traffic but each approach comes with tradeoffs. Though for many missing data scenarios random forest performed best, this method is complicated to estimate and apply. DOY factor-based methods are simpler to create and apply, and though more accurate in scenarios with significant amounts of missing data, they were less flexible given the need for data from neighboring count sites. Negative binomial regression was also found to work well in scenarios with moderate to low amounts of missing data. This work can inform nonmotorized traffic count programs needing vetted solutions for traffic data imputation.

2019 ◽  
Vol 10 (1) ◽  
pp. 129 ◽  
Author(s):  
Jonghak Lee ◽  
Taekwan Yoon ◽  
Sangil Kwon ◽  
Jongtae Lee

There have been numerous studies on traffic accidents and their severity, particularly in relation to weather conditions and road geometry. In these studies, traditional statistical methods have been employed, such as linear regression, logistic regression, and negative binomial regression modeling, which are the most common linear and non-linear regression analysis methods. In this research, machine learning architecture was applied to this problem using the random forest, artificial neural network, and decision tree techniques to ascertain the strengths and weaknesses of these methods. Three data sets were used: road geometry data, precipitation data, and traffic accident data over nine years corresponding to the Naebu Expressway, which is located in Seoul, Korea. For the model evaluation, three measures were employed: the out-of-bag estimate of error rate (OOB), mean square error (MSE), and root mean square error (RMSE). The low mean OOB, MSE, and RMSE observed in the results obtained using the proposed random forest model demonstrate its accuracy.


2020 ◽  
pp. 1-24
Author(s):  
TINGBIN BIAN ◽  
JIN CHEN ◽  
QU FENG ◽  
JINGYI LI

We aim to compare econometric analyses with machine learning approaches in the context of Singapore private property market using transaction data covering the period of 1995–2018. A hedonic model is employed to quantify the premiums of important attributes and amenities, with a focus on the premium of distance to nearest Mass Rapid Transit (MRT) stations. In the meantime, an investigation using machine learning algorithms under three categories — LASSO, random forest and artificial neural networks is conducted in the same context with deeper insights on importance of determinants of property prices. The results suggest that the MRT distance premium is significant and moving 100[Formula: see text]m closer from the mean distance point to the nearest MRT station would increase the overall transacted price by about 15,000 Singapore dollars (SGD). Machine learning approaches generally achieve higher prediction accuracy and heterogeneous property age premium is suggested by LASSO. Using random forest algorithm, we find that property prices are mostly affected by key macroeconomic factors, such as the time of sale, as well as the size and floor level of property. Finally, an appraisal on different approaches is provided for researchers to utilize additional data sources and data-driven approaches to exploit potential causal effects in economic studies.


2021 ◽  
Author(s):  
Martin Seeliger ◽  
Marina Altmeyer ◽  
Andreas Ginau ◽  
Robert Schiestl ◽  
Jürgen Wunderlich

<p>This paper presents the application of machine-learning techniques on pXRF data to establish a chronology for sediment cores around Tell Buto (Tell el-Fara´in) in the northwestern Nile Delta. As modern laboratories for dating techniques like OSL or <sup>14</sup>C are rare in Egypt and sample export is restricted, we are facing a lack of opportunities to create a robust chronology, which is indispensable in modern Geoarchaeology.</p><p>Therefore, we present a new approach to transfer archaeological age information gained at the excavation at Buto to corings of the wider Buto area. Sediments of archaeological outcrops and pits with known age are measured using pXRF to create a geochemical “fingerprint” for several historic eras. Afterwards, these “fingerprints” are transferred to corings of the surrounding areas using machine-learning algorithms.</p><p>This paper presents 1) the application of three different machine-learning approaches (Neuronal Net, Random Forest, and C5.0 decision tree) to check if archaeological age information can be transferred to sediments far off the settlement mounds using pXRF data, 2) the comparison of all approaches and the evaluation if the easily anticipated decision tree and Random Forest show similar results as the “black-box system” Neuronal Net, and finally, 3) a case study that provides the results of Altmeyer et al. (in review) for Kom el-Gir, a further settlement mound little north of Buto, with a chronostratigraphic framework based on this approach.</p><p>Reference:</p><p>Altmeyer, M., Seeliger, M., Ginau, A., Schiestl, R. & J. Wunderlich (in review):  Reconstruction of former channel systems in the northwestern Nile Delta (Egypt) based on corings and electrical resistivity tomography (ERT). (Submitted to E & G Quaternary Science Journal).</p>


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Tahia Tazin ◽  
Md Nur Alam ◽  
Nahian Nakiba Dola ◽  
Mohammad Sajibul Bari ◽  
Sami Bourouis ◽  
...  

Stroke is a medical disorder in which the blood arteries in the brain are ruptured, causing damage to the brain. When the supply of blood and other nutrients to the brain is interrupted, symptoms might develop. According to the World Health Organization (WHO), stroke is the greatest cause of death and disability globally. Early recognition of the various warning signs of a stroke can help reduce the severity of the stroke. Different machine learning (ML) models have been developed to predict the likelihood of a stroke occurring in the brain. This research uses a range of physiological parameters and machine learning algorithms, such as Logistic Regression (LR), Decision Tree (DT) Classification, Random Forest (RF) Classification, and Voting Classifier, to train four different models for reliable prediction. Random Forest was the best performing algorithm for this task with an accuracy of approximately 96 percent. The dataset used in the development of the method was the open-access Stroke Prediction dataset. The accuracy percentage of the models used in this investigation is significantly higher than that of previous studies, indicating that the models used in this investigation are more reliable. Numerous model comparisons have established their robustness, and the scheme can be deduced from the study analysis.


2020 ◽  
Author(s):  
Albert Morera ◽  
Juan Martínez de Aragón ◽  
José Antonio Bonet ◽  
Jingjing Liang ◽  
Sergio de-Miguel

Abstract BackgroundThe prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modelling tools. This study compares different statistical and machine learning models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modelling approaches to provide accurate and ecologically-consistent predictions.MethodsWe evaluated and compared the performance of two statistical modelling techniques, namely, generalized linear mixed models and geographically weighted regression, and four machine learning models, namely, random forest, extreme gradient boosting, support vector machine and deep learning to predict fungal productivity. We used a systematic methodology based on substitution, random, spatial and climatic blocking combined with principal component analysis, together with an evaluation of the ecological consistency of spatially-explicit model predictions.ResultsFungal productivity predictions were sensitive to the modelling approach and complexity. Moreover, the importance assigned to different predictors varied between machine learning modelling approaches. Decision tree-based models increased prediction accuracy by ~7% compared to other machine learning approaches and by more than 25% compared to statistical ones, and resulted in higher ecological consistence at the landscape level.ConclusionsWhereas a large number of predictors are often used in machine learning algorithms, in this study we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. When dealing with spatial-temporal data in the analysis of biogeographical patterns, climatic blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales. Random forest was the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modelling data.


Recent advancements in remote sensing platforms from satellites to close-range Remotely Piloted Aircraft System (RPAS), is principal to a growing demand for innovative image processing and classification tools. Where, Machine learning approaches are very prevailing group of data driven implication tools that provide a broader scope when applied to remote sensed data. In this paper, applying different machine learning approaches on the remote sensing images with open source packages in R, to find out which algorithm is more efficient for obtaining better accuracy. We carried out a rigorous comparison of four machine learning algorithms-Support vector machine, Random forest, regression tree, Classification and Naive Bayes. These algorithms are evaluated by Classification accurateness, Kappa index and curve area as accuracy metrics. Ten runs are done to obtain the variance in the results on the training set. Using k-fold cross validation the validation is carried out. This theme identifies Random forest approach as the best method based on the accuracy measure under different conditions. Random forest is used to train efficient and highly stable with respect to variations in classification representation parameter values and significantly more accurate than other machine learning approaches trailed


2020 ◽  
Author(s):  
Javad Nazari ◽  
Parnia-Sadat Fathi ◽  
Nahid Sharahi ◽  
Majid Taheri ◽  
Payam Amini ◽  
...  

Abstract Background: Measles is a feverish condition labeled among the most infectious viral illnesses in the globe. Despite the presence of a secure, accessible, affordable and efficient vaccine, measles continues to be a worldwide concern. Methods: This study uses machine learning and time series methods to assess factors that placed people at a higher risk of measles. This historical cohort study contained the Measles incidence in Markazi Province, the center of Iran, from April 1997 to February 2020. Logistic regression, linear discriminant analysis, random forest, artificial neural network, bagging, support vector machine, and naïve Bayes were used to make the classification. Zero-inflated negative binomial regression for time series was utilized to assess development of measles over time. Results: The prevalence of measles was 14.5% over the recent 24 years and a constant trend of almost zero cases was observed from 2002 to 2020. The order of independent variable importance were recent years, age, vaccination, rhinorrhea, male sex, contact with measles patients, cough, conjunctivitis, ethnic, and fever. Younger age, less probability of contact and no fever is associated with less odds of zero cases. Only 7 new cases were forecasted for the next two years. Bagging and random forest were the most accurate classification methods. Conclusion: Even if the numbers of new cases are almost zero during the recent years, it has been showed that age and contact are responsible for non-occurrence of measles. October and May are prone to have new cases for 2021 and 2022.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4605
Author(s):  
Ladislav Polak ◽  
Stanislav Rozum ◽  
Martin Slanina ◽  
Tomas Bravenec ◽  
Tomas Fryza ◽  
...  

The fingerprinting technique is a popular approach to reveal location of persons, instruments or devices in an indoor environment. Typically based on signal strength measurement, a power level map is created first in the learning phase to align with measured values in the inference. Second, the location is determined by taking the point for which the recorded received power level is closest to the power level actually measured. The biggest limit of this technique is the reliability of power measurements, which may lack accuracy in many wireless systems. To this end, this work extends the power level measurement by using multiple anchors and multiple radio channels and, consequently, considers different approaches to aligning the actual measurements with the recorded values. The dataset is available online. This article focuses on the very popular radio technology Bluetooth Low Energy to explore the possible improvement of the system accuracy through different machine learning approaches. It shows how the accuracy–complexity trade-off influences the possible candidate algorithms on an example of three-channel Bluetooth received signal strength based fingerprinting in a one dimensional environment with four static anchors and in a two dimensional environment with the same set of anchors. We provide a literature survey to identify the machine learning algorithms applied in the literature to show that the studies available can not be compared directly. Then, we implement and analyze the performance of four most popular supervised learning techniques, namely k Nearest Neighbors, Support Vector Machines, Random Forest, and Artificial Neural Network. In our scenario, the most promising machine learning technique being the Random Forest with classification accuracy over 99%.


2020 ◽  
Vol 25 (40) ◽  
pp. 4296-4302 ◽  
Author(s):  
Yuan Zhang ◽  
Zhenyan Han ◽  
Qian Gao ◽  
Xiaoyi Bai ◽  
Chi Zhang ◽  
...  

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


Sign in / Sign up

Export Citation Format

Share Document