Understanding Multi-Vehicle Collision Patterns on Freeways—A Machine Learning Approach

Generating meaningful inferences from crash data is vital to improving highway safety. Classic statistical methods are fundamental to crash data analysis and often regarded for their interpretability. However, given the complexity of crash mechanisms and associated heterogeneity, classic statistical methods, which lack versatility, might not be sufficient for granular crash analysis because of the high dimensional features involved in crash-related data. In contrast, machine learning approaches, which are more flexible in structure and capable of harnessing richer data sources available today, emerges as a suitable alternative. With the aid of new methods for model interpretation, the complex machine learning models, previously considered enigmatic, can be properly interpreted. In this study, two modern machine learning techniques, Linear Discriminate Analysis and eXtreme Gradient Boosting, were explored to classify three major types of multi-vehicle crashes (i.e., rear-end, same-direction sideswipe, and angle) occurred on Interstate 285 in Georgia. The study demonstrated the utility and versatility of modern machine learning methods in the context of crash analysis, particularly in understanding the potential features underlying different crash patterns on freeways.

Download Full-text

Pharmacy Impact on Covid-19 Vaccination Progress Using Machine Learning Approach

Journal of Pharmaceutical Research International ◽

10.9734/jpri/2021/v33i38a32076 ◽

2021 ◽

pp. 202-217

Author(s):

Shawni Dutta ◽

Upasana Mukherjee ◽

Samir Kumar Bandyopadhyay

Keyword(s):

Machine Learning ◽

Social Life ◽

Mean Squared Error ◽

Severe Depression ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Approaches ◽

Human Beings ◽

Infected People ◽

Extreme Gradient Boosting

The novel coronavirus disease (COVID-19) has created immense threats to public health on various levels around the globe. The unpredictable outbreak of this disease and the pandemic situation are causing severe depression, anxiety and other mental as physical health related problems among the human beings. This deadly disease has put social, economic condition of the entire world into an enormous challenge. To combat against this disease, vaccination is essential as it will boost the immune system of human beings while being in the contact with the infected people. The vaccination process is thus necessary to confront the outbreak of COVID-19. The worldwide vaccination progress should be tracked to identify how fast the entire economic as well as social life will be stabilized. The monitor of the vaccination progress, a machine learning based Regressor model is approached in this study. This vaccination tracking process has been applied on the data starting from 14th December, 2020 to 24th April, 2021. A couple of ensemble based machine learning Regressor models such as Random Forest, Extra Trees, Gradient Boosting, AdaBoost and Extreme Gradient Boosting are implemented and their predictive performance are compared. The comparative study reveals that the Extra trees Regressor outperforms with minimized mean absolute error (MAE) of 6.465 and root mean squared error (RMSE) of 8.127. The uniqueness of this study relies on assessing as well as predicting vaccination intake progress by utilizing automated process offered by machine learning techniques. The innovative idea of the method is that the vaccination process and their priority are considered in the paper. Among several existing machine learning approaches, the ensemble based learning paradigms are employed in this study so that improved prediction efficiency can be delivered.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Machine learning for metagenomics: methods and tools

Metagenomics ◽

10.1515/metgen-2016-0001 ◽

2017 ◽

Vol 1 (1) ◽

Cited By ~ 15

Author(s):

Hayssam Soueidan ◽

Macha Nikolski

Keyword(s):

Machine Learning ◽

Gene Prediction ◽

Full Range ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Unified Framework ◽

Comparative Metagenomics ◽

Learning Techniques ◽

Ngs Data ◽

Modern Machine

AbstractOwing to the complexity and variability of metagenomic studies, modern machine learning approaches have seen increased usage to answer a variety of question encompassing the full range of metagenomic NGS data analysis.We review here the contribution of machine learning techniques for the field of metagenomics, by presenting known successful approaches in a unified framework. This review focuses on five important metagenomic problems:OTU-clustering, binning, taxonomic proffiing and assignment, comparative metagenomics and gene prediction. For each of these problems, we identify the most prominent methods, summarize the machine learning approaches used and put them into perspective of similar methods.We conclude our review looking further ahead at the challenge posed by the analysis of interactions within microbial communities and different environments, in a field one could call “integrative metagenomics”.

Download Full-text

Machine learning techniques to predict daily rainfall amount

Journal Of Big Data ◽

10.1186/s40537-021-00545-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chalachew Muluken Liyew ◽

Haileyesus Amsaya Melese

Keyword(s):

Machine Learning ◽

Pearson Correlation ◽

Daily Rainfall ◽

Learning Model ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Correlation Technique ◽

Learning Techniques ◽

Machine Learning Model ◽

Extreme Gradient Boosting

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.

Download Full-text

Gradient Boosting Machine Learning to Improve Satellite-Derived Column Water Vapor Measurement Error

10.5194/amt-2019-308 ◽

2019 ◽

Cited By ~ 1

Author(s):

Allan C. Just ◽

Yang Liu ◽

Meytar Sorek-Hamer ◽

Johnathan Rush ◽

Michael Dorman ◽

...

Keyword(s):

Machine Learning ◽

Water Vapor ◽

Measurement Error ◽

Earth Science ◽

Atmospheric Correction ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Learning Approaches ◽

Sensing Applications ◽

Extreme Gradient Boosting

Abstract. The atmospheric products of the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm include column water vapor (CWV) at 1 km resolution, derived from daily overpasses of NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS) instruments aboard the Aqua and Terra satellites. We have recently shown that machine learning using extreme gradient boosting (XGBoost) can improve the estimation of MAIAC aerosol optical depth (AOD). Although MAIAC CWV is generally well validated (Pearson’s R > 0.97 versus CWV from AERONET sun photometers), it has not yet been assessed whether machine-learning approaches can further improve CWV. Using a novel spatiotemporal cross-validation approach to avoid overfitting, our XGBoost model with nine features derived from land use terms, date, and ancillary variables from the MAIAC retrieval, quantifies and can correct a substantial portion of measurement error relative to collocated measures at AERONET sites (26.9 % and 16.5 % decrease in Root Mean Square Error (RMSE) for Terra and Aqua datasets, respectively) in the Northeastern USA, 2000–2015. We use machine-learning interpretation tools to illustrate complex patterns of measurement error and describe a positive bias in MAIAC Terra CWV worsening in recent summertime conditions. We validate our predictive model on MAIAC CWV estimates at independent stations from the SuomiNet GPS network where our corrections decrease the RMSE by 19.7 % and 9.5 % for Terra and Aqua MAIAC CWV. Empirically correcting for measurement error with machine-learning algorithms is a post-processing opportunity to improve satellite-derived CWV data for Earth science and remote sensing applications.

Download Full-text

Gradient boosting machine learning to improve satellite-derived column water vapor measurement error

Atmospheric Measurement Techniques ◽

10.5194/amt-13-4669-2020 ◽

2020 ◽

Vol 13 (9) ◽

pp. 4669-4681

Author(s):

Allan C. Just ◽

Yang Liu ◽

Meytar Sorek-Hamer ◽

Johnathan Rush ◽

Michael Dorman ◽

...

Keyword(s):

Machine Learning ◽

Water Vapor ◽

Measurement Error ◽

Earth Science ◽

Atmospheric Correction ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Learning Approaches ◽

Sensing Applications ◽

Extreme Gradient Boosting

Abstract. The atmospheric products of the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm include column water vapor (CWV) at a 1 km resolution, derived from daily overpasses of NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instruments aboard the Aqua and Terra satellites. We have recently shown that machine learning using extreme gradient boosting (XGBoost) can improve the estimation of MAIAC aerosol optical depth (AOD). Although MAIAC CWV is generally well validated (Pearson's R > 0.97 versus CWV from AERONET sun photometers), it has not yet been assessed whether machine-learning approaches can further improve CWV. Using a novel spatiotemporal cross-validation approach to avoid overfitting, our XGBoost model, with nine features derived from land use terms, date, and ancillary variables from the MAIAC retrieval, quantifies and can correct a substantial portion of measurement error relative to collocated measurements at AERONET sites (26.9 % and 16.5 % decrease in root mean square error (RMSE) for Terra and Aqua datasets, respectively) in the Northeastern USA, 2000–2015. We use machine-learning interpretation tools to illustrate complex patterns of measurement error and describe a positive bias in MAIAC Terra CWV worsening in recent summertime conditions. We validate our predictive model on MAIAC CWV estimates at independent stations from the SuomiNet GPS network where our corrections decrease the RMSE by 19.7 % and 9.5 % for Terra and Aqua MAIAC CWV. Empirically correcting for measurement error with machine-learning algorithms is a postprocessing opportunity to improve satellite-derived CWV data for Earth science and remote sensing applications.

Download Full-text

Applying Deep Neural Networks and Ensemble Machine Learning Methods to Forecast Airborne Ambrosia Pollen

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16111992 ◽

2019 ◽

Vol 16 (11) ◽

pp. 1992 ◽

Cited By ~ 6

Author(s):

Gebreab K. Zewdie ◽

David J. Lary ◽

Estelle Levetin ◽

Gemechu F. Garuma

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Land Surface ◽

Deep Neural Networks ◽

Airborne Pollen ◽

Training Data ◽

Gradient Boosting ◽

Learning Approaches ◽

Ambrosia Pollen ◽

Extreme Gradient Boosting

Allergies to airborne pollen are a significant issue affecting millions of Americans. Consequently, accurately predicting the daily concentration of airborne pollen is of significant public benefit in providing timely alerts. This study presents a method for the robust estimation of the concentration of airborne Ambrosia pollen using a suite of machine learning approaches including deep learning and ensemble learners. Each of these machine learning approaches utilize data from the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric weather and land surface reanalysis. The machine learning approaches used for developing a suite of empirical models are deep neural networks, extreme gradient boosting, random forests and Bayesian ridge regression methods for developing our predictive model. The training data included twenty-four years of daily pollen concentration measurements together with ECMWF weather and land surface reanalysis data from 1987 to 2011 is used to develop the machine learning predictive models. The last six years of the dataset from 2012 to 2017 is used to independently test the performance of the machine learning models. The correlation coefficients between the estimated and actual pollen abundance for the independent validation datasets for the deep neural networks, random forest, extreme gradient boosting and Bayesian ridge were 0.82, 0.81, 0.81 and 0.75 respectively, showing that machine learning can be used to effectively forecast the concentrations of airborne pollen.

Download Full-text

Improving Sports Outcome Prediction Process Using Integrating Adaptive Weighted Features and Machine Learning Techniques

Processes ◽

10.3390/pr9091563 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1563

Author(s):

Chi-Jie Lu ◽

Tian-Shyug Lee ◽

Chien-Chih Wang ◽

Wei-Jen Chen

Keyword(s):

Machine Learning ◽

Outcome Prediction ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Weighting ◽

Stochastic Gradient Boosting ◽

Basketball Game ◽

Extreme Gradient Boosting

Developing an effective sports performance analysis process is an attractive issue in sports team management. This study proposed an improved sports outcome prediction process by integrating adaptive weighted features and machine learning algorithms for basketball game score prediction. The feature engineering method is used to construct designed features based on game-lag information and adaptive weighting of variables in the proposed prediction process. These designed features are then applied to the five machine learning methods, including classification and regression trees (CART), random forest (RF), stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and extreme learning machine (ELM) for constructing effective prediction models. The empirical results from National Basketball Association (NBA) data revealed that the proposed sports outcome prediction process could generate a promising prediction result compared to the competing models without adaptive weighting features. Our results also showed that the machine learning models with four game-lags information and adaptive weighting of power could generate better prediction performance.

Download Full-text

An Application of Natural Language Processing to Classify What Terrorists Say They Want

Social Sciences ◽

10.3390/socsci11010023 ◽

2022 ◽

Vol 11 (1) ◽

pp. 23

Author(s):

Raj Bridgelall

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

The Body ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Incentive Structure ◽

Human Decision Process ◽

Extreme Gradient Boosting

Knowing what perpetrators want can inform strategies to achieve safe, secure, and sustainable societies. To help advance the body of knowledge in counterterrorism, this research applied natural language processing and machine learning techniques to a comprehensive database of terrorism events. A specially designed empirical topic modeling technique provided a machine-aided human decision process to glean six categories of perpetrator aims from the motive text narrative. Subsequently, six different machine learning models validated the aim categories based on the accuracy of their association with a different narrative field, the event summary. The ROC-AUC scores of the classification ranged from 86% to 93%. The Extreme Gradient Boosting model provided the best predictive performance. The intelligence community can use the identified aim categories to help understand the incentive structure of terrorist groups and customize strategies for dealing with them.

Download Full-text

MACHINE LEARNING FOR SEA ICE MONITORING FROM SATELLITES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w16-83-2019 ◽

2019 ◽

Vol XLII-2/W16 ◽

pp. 83-89

Author(s):

C. O. Dumitru ◽

V. Andrei ◽

G. Schwarz ◽

M. Datcu

Keyword(s):

Machine Learning ◽

Sea Ice ◽

Weather Conditions ◽

Representation Learning ◽

Image Understanding ◽

Machine Learning Techniques ◽

Polar Regions ◽

Learning Approaches ◽

Automated Processing ◽

Modern Machine

<p><strong>Abstract.</strong> Today, radar imaging from space allows continuous and wide-area sea ice monitoring under nearly all weather conditions. To this end, we applied modern machine learning techniques to produce ice-describing semantic maps of the polar regions of the Earth. Time series of these maps can then be exploited for local and regional change maps of selected areas. What we expect, however, are fully-automated unsupervised routine classifications of sea ice regions that are needed for the rapid and reliable monitoring of shipping routes, drifting and disintegrating icebergs, snowfall and melting on ice, and other dynamic climate change indicators. Therefore, we designed and implemented an automated processing chain that analyses and interprets the specific ice-related content of high-resolution synthetic aperture radar (SAR) images. We trained this system with selected images covering various use cases allowing us to interpret these images with modern machine learning approaches. In the following, we describe a system comprising representation learning, variational inference, and auto-encoders. Test runs have already demonstrated its usefulness and stability that can pave the way towards future artificial intelligence systems extending, for instance, the current capabilities of traditional image analysis by including content-related image understanding.</p>

Download Full-text