Investigating Effect of Driver-, Vehicle-, and Road-Related Factors on Location-Specific Crashes with Naturalistic Driving Data

Author(s):  
Grace Ashley ◽  
Osama A. Osman ◽  
Sherif Ishak ◽  
Julius Codjoe

According to NHTSA, traffic accidents cost the United States billions of U.S. dollars each year. Intersection accidents alone accounted for 23% of the 32,675 motor crash deaths in 2014. With the advent of the largest naturalistic driving data set in the United States collected by the SHRP2 Naturalistic Driving Study project, this study performs a crash-only analysis to identify driver-, vehicle-, and roadway-related factors that affect the driving risk at different location types using a machine learning tool. The study then analyzes the most important factors obtained from the machine learning analysis to identify how they affect crash risk. The results, in order of importance of variables, were driver behavior, locality, lane occupied, alignment, and through travel lanes. Also, drivers who violated traffic signals were four times more likely to be involved in a crash than drivers who did not. Those who violated stop signs were two times more likely to be involved in crashes than those who did not. Drivers performing visual-manual (VM) tasks at uncontrolled intersections were 2.7 times more likely to be involved in crashes than those who did not engage in these tasks. At nonintersections, drivers who performed VM tasks were 3.4 times more likely to be involved in crashes than drivers who did not. These findings add to the evidence that the establishment of safety awareness programs geared toward intersection safety is imperative.

Author(s):  
Christian M. Richard ◽  
James L. Brown ◽  
Randolph Atkins ◽  
Gautam Divekar

Speeding-related crashes continue to be a serious problem in the United States. A recently completed NHTSA project, Motivations for Speeding, collected data to address questions about driver speeding behavior. This naturalistic driving study used 1-Hz GPS units to collect data from 88 drivers in Seattle, Washington, to record how fast vehicles traveled on different roadways. The current project further developed this data set to redefine speeding in terms of speeding episodes, which were continuous periods in which drivers exceeded the posted speed limit by at least 10 mph. More than half of all study participants averaged less than one speeding episode per trip taken. Various characteristics of speeding episodes representing aspects such as duration, magnitude, variability, and overall form of speeding were examined. Cluster analyses conducted using these characteristics of speeding episodes identified six types of speeding. These included two types of speeding that occurred around speed-zone transitions (speeding up and slowing down), incidental speeding, casual speeding, cruising speeding, and aggressive speeding. Qualitative examination of the speeding types indicated that these types also differed in terms of the prevalence of additional risky situational characteristics.


2017 ◽  
Vol 2659 (1) ◽  
pp. 204-211 ◽  
Author(s):  
Mengqiu Ye ◽  
Osama A. Osman ◽  
Sherif Ishak

Distracted driving has long been acknowledged as one of the main contributors to crashes in the United States. According to past studies, driving behavior proved to be influenced by the socioeconomic characteristics of drivers. However, few studies attempted to quantify that influence. This study proposed a crash risk index (CRI) to estimate the crash risk associated with the socioeconomic characteristics of drivers and their tendency to experience distracted driving. The analysis was conducted with data from the SHRP 2 Naturalistic Driving Study. The proposed CRI was developed on a grading system of three measures: the crash risk associated with performing secondary tasks during driving, the effect of socioeconomic attributes (e.g., age) on the likelihood of engagement in secondary tasks, and the effect of specific categories within each socioeconomic attribute (e.g., age older than 60) on the likelihood of engagement in secondary tasks. Logistic regression analysis was performed on the secondary tasks, socioeconomic attributes, and specific socioeconomic characteristics. The results identified the significant secondary tasks with high crash risk and the socioeconomic characteristics with significant effect on determining drivers’ involvement in secondary tasks in each tested parameter. These results were used to quantify the grading system measures and hence estimate the proposed CRI. This index indicates the relative crash risk associated with the socioeconomic characteristics of drivers and considers the possibility of engagement in secondary tasks. The proposed CRI and the associated grading system are plausible methods for estimating auto insurance premiums.


Author(s):  
Yulan Liang ◽  
John D. Lee ◽  
Lora Yekhshatyan

Objective: In this study, the authors used algorithms to estimate driver distraction and predict crash and near-crash risk on the basis of driver glance behavior using the data set of the 100-Car Naturalistic Driving Study. Background: Driver distraction has been a leading cause of motor vehicle crashes, but the relationship between distractions and crash risk lacks detailed quantification. Method: The authors compared 24 algorithms that varied according to how they incorporated three potential contributors to distraction—glance duration, glance history, and glance location—on how well the algorithms predicted crash risk. Results: Distraction estimated from driver eye-glance patterns was positively associated with crash risk. The algorithms incorporating ongoing off-road glance duration predicted crash risk better than did the algorithms incorporating glance history. Augmenting glance duration with other elements of glance behavior—1.5th power of duration and duration weighted by glance location—produced similar prediction performance as glance duration alone. Conclusions: The distraction level estimated by the algorithms that include current glance duration provides the most sensitive indicator of crash risk. Application: The results inform the design of algorithms to monitor driver state that support real-time distraction mitigation systems.


Author(s):  
Samantha H. Haus ◽  
Ryan M. Anderson ◽  
Rini Sherony ◽  
Hampton C. Gabler

In the United States, fatalities from vehicle–bicycle crashes have been increasing since 2010. A total of 857 cyclists were struck and killed in 2018 which is an increase from 623 fatalities in 2010. One promising countermeasure is Automatic Emergency Braking (AEB), which can help prevent and/or mitigate many vehicle–bicycle crashes. AEB is a vehicle-based system that can detect and mitigate an impending crash. The goal of this study was to elucidate U.S. vehicle–bicycle crashes and examine related factors to estimate AEB effectiveness. This study used a unique in-depth vehicle–bicycle crash study dataset collected under the collaboration of the Washtenaw Area Transportation Study (WATS) and the Toyota Collaborative Research Center conducted in southeast Michigan from 2011 to 2013. The WATS database provides in-depth investigations of vehicle–bicycle crashes in the United States. The characteristics of the WATS vehicle–bicycle crashes were validated against the Fatality Analysis Reporting System and the General Estimate System. The WATS database cases were examined to estimate the potential effectiveness of AEB to prevent or mitigate vehicle–bicycle collisions. In 60% of the WATS cases, cyclists were in the road for more than 1 s before impact. Assuming that a hypothetical AEB system requires a minimum of 1 s for detection and brake activation, these collisions would potentially be avoided or mitigated. However, for the remaining cases with less than 1 s of time to react (40% of cases), that AEB system would be challenged to avoid or mitigate the collision.


2020 ◽  
Author(s):  
Xiaoqian Jiang ◽  
Lishan Yu ◽  
Hamisu M. Salihub ◽  
Deepa Dongarwar

BACKGROUND In the United States, State laws require birth certificates to be completed for all births; and federal law mandates national collection and publication of births and other vital statistics data. National Center for Health Statistics (NCHS) has published the key statistics of birth data over the years. These data files, from as early as the 1970s, have been released and made publicly available. There are about 3 million new births each year, and every birth is a record in the data set described by hundreds of variables. The total data cover more than half of the current US population, making it an invaluable resource to study and examine birth epidemiology. Using such big data, researchers can ask interesting questions and study longitudinal patterns, for example, the impact of mother's drinking status to infertility in metropolitans in the last decade, or the education level of the biological father to the c-sections over the years. However, existing published data sets cannot directly support these research questions as there are adjustments to the variables and their categories, which makes these individually published data files fragmented. The information contained in the published data files is highly diverse, containing hundreds of variables each year. Besides minor adjustments like renaming and increasing variable categories, some major updates significantly changed the fields of statistics (including removal, addition, and modification of the variables), making the published data disconnected and ambiguous to use over multiple years. Researchers have previously reconstructed features to study temporal patterns, but the scale is limited (focusing only on a few variables of interest). Many have reinvented the wheels, and such reconstructions lack consistency as different researchers might use different criteria to harmonize variables, leading to inconsistent findings and limiting the reproducibility of research. There is no systematic effort to combine about five decades of data files into a database that includes every variable that has ever been released by NCHS. OBJECTIVE To utilize machine learning techniques to combine the United States (US) natality data for the last five decades, with changing variables and factors, into a consistent database. METHODS We developed a feasible and efficient deep-learning-based framework to harmonize data sets of live births in the US from 1970 to 2018. We constructed a graph based on the property and elements of databases including variables and conducted a graph convolutional network (GCN) on the graph to learn the graph embeddings for nodes where the learned embeddings implied the similarity of variables. We devised a novel loss function with a slack margin and a banlist mechanism (for a random walk) to learn the desired structure (two nodes sharing more information were more similar to each other.). We developed an active learning mechanism to conduct the harmonization. RESULTS We harmonized historical US birth data and resolved conflicts in ambiguous terms. From a total of 9,321 variables (i.e., 783 stemmed variables, from 1970 to 2018) we applied our model iteratively together with human review, obtaining 323 hyperchains of variables. Hyperchains for harmonization were composed of 201 stemmed variable pairs when considering any pairs of different stemmed variables changed over years. During the harmonization, the first round of our model provided 305 candidates stemmed variable pairs (based on the top-20 most similar variables of each variable based on the learned embeddings of variables) and achieved recall and precision of 87.56%, 57.70%, respectively. CONCLUSIONS Our harmonized graph neural network (HGNN) method provides a feasible and efficient way to connect relevant databases at a meta-level. Adapting to databases' property and characteristics, HGNN can learn patterns and search relations globally, which is powerful to discover the similarity between variables among databases. Smart utilization of machine learning can significantly reduce the manual effort in database harmonization and integration of fragmented data into useful databases for future research.


2021 ◽  
Vol 14 (5) ◽  
pp. 472
Author(s):  
Tyler C. Beck ◽  
Kyle R. Beck ◽  
Jordan Morningstar ◽  
Menny M. Benjamin ◽  
Russell A. Norris

Roughly 2.8% of annual hospitalizations are a result of adverse drug interactions in the United States, representing more than 245,000 hospitalizations. Drug–drug interactions commonly arise from major cytochrome P450 (CYP) inhibition. Various approaches are routinely employed in order to reduce the incidence of adverse interactions, such as altering drug dosing schemes and/or minimizing the number of drugs prescribed; however, often, a reduction in the number of medications cannot be achieved without impacting therapeutic outcomes. Nearly 80% of drugs fail in development due to pharmacokinetic issues, outlining the importance of examining cytochrome interactions during preclinical drug design. In this review, we examined the physiochemical and structural properties of small molecule inhibitors of CYPs 3A4, 2D6, 2C19, 2C9, and 1A2. Although CYP inhibitors tend to have distinct physiochemical properties and structural features, these descriptors alone are insufficient to predict major cytochrome inhibition probability and affinity. Machine learning based in silico approaches may be employed as a more robust and accurate way of predicting CYP inhibition. These various approaches are highlighted in the review.


Author(s):  
Anik Das ◽  
Mohamed M. Ahmed

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.


2021 ◽  
pp. 1-4
Author(s):  
Mathieu D'Aquin ◽  
Stefan Dietze

The 29th ACM International Conference on Information and Knowledge Management (CIKM) was held online from the 19 th to the 23 rd of October 2020. CIKM is an annual computer science conference, focused on research at the intersection of information retrieval, machine learning, databases as well as semantic and knowledge-based technologies. Since it was first held in the United States in 1992, 28 conferences have been hosted in 9 countries around the world.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Richard Johnston ◽  
Xiaohan Yan ◽  
Tatiana M. Anderson ◽  
Edwin A. Mitchell

AbstractThe effect of altitude on the risk of sudden infant death syndrome (SIDS) has been reported previously, but with conflicting findings. We aimed to examine whether the risk of sudden unexpected infant death (SUID) varies with altitude in the United States. Data from the Centers for Disease Control and Prevention (CDC)’s Cohort Linked Birth/Infant Death Data Set for births between 2005 and 2010 were examined. County of birth was used to estimate altitude. Logistic regression and Generalized Additive Model (GAM) were used, adjusting for year, mother’s race, Hispanic origin, marital status, age, education and smoking, father’s age and race, number of prenatal visits, plurality, live birth order, and infant’s sex, birthweight and gestation. There were 25,305,778 live births over the 6-year study period. The total number of deaths from SUID in this period were 23,673 (rate = 0.94/1000 live births). In the logistic regression model there was a small, but statistically significant, increased risk of SUID associated with birth at > 8000 feet compared with < 6000 feet (aOR = 1.93; 95% CI 1.00–3.71). The GAM showed a similar increased risk over 8000 feet, but this was not statistically significant. Only 9245 (0.037%) of mothers gave birth at > 8000 feet during the study period and 10 deaths (0.042%) were attributed to SUID. The number of SUID deaths at this altitude in the United States is very small (10 deaths in 6 years).


Sign in / Sign up

Export Citation Format

Share Document