Machine Learning Approaches to Traffic Accident Analysis and Hotspot Prediction

Traffic accidents are one of the most important concerns of the world, since they result in numerous casualties, injuries, and fatalities each year, as well as significant economic losses. There are many factors that are responsible for causing road accidents. If these factors can be better understood and predicted, it might be possible to take measures to mitigate the damages and its severity. The purpose of this work is to identify these factors using accident data from 2016 to 2019 from the district of Setúbal, Portugal. This work aims at developing models that can select a set of influential factors that may be used to classify the severity of an accident, supporting an analysis on the accident data. In addition, this study also proposes a predictive model for future road accidents based on past data. Various machine learning approaches are used to create these models. Supervised machine learning methods such as decision trees (DT), random forests (RF), logistic regression (LR), and naive Bayes (NB) are used, as well as unsupervised machine learning techniques including DBSCAN and hierarchical clustering. Results show that a rule-based model using the C5.0 algorithm is capable of accurately detecting the most relevant factors describing a road accident severity. Further, the results of the predictive model suggests the RF model could be a useful tool for forecasting accident hotspots.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text

Machine Learning Approaches to Retrieve High-Quality, Clinically Relevant, Evidence from the Biomedical Literature: A Systematic Review (Preprint)

10.2196/preprints.30401 ◽

2021 ◽

Author(s):

Wael Abdelkader ◽

Tamara Navarro ◽

Rick Parrish ◽

Chris Cotoi ◽

Federico Germini ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Strong Evidence ◽

Clinical Care ◽

Biomedical Literature ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Approaches ◽

High Quality ◽

Applied Machine Learning

BACKGROUND The rapid growth of the biomedical literature makes identifying strong evidence a time-consuming task. Applying machine learning to the process could be a viable solution that limits effort while maintaining accuracy. OBJECTIVE To summarize the nature and comparative performance of machine learning approaches that have been applied to retrieve high-quality evidence for clinical consideration from the biomedical literature. METHODS We conducted a systematic review of studies that applied machine learning techniques to identify high-quality clinical articles in the biomedical literature. Multiple databases were searched to July 2020. Extracted data focused on the applied machine learning model, steps in the development of the models, and model performance. RESULTS From 3918 retrieved studies, 10 met our inclusion criteria. All followed a supervised machine learning approach and applied, from a limited range of options, a high-quality standard for the training of their model. The results show that machine learning can achieve a sensitivity of 95% while maintaining a high precision of 86%. CONCLUSIONS Applying machine learning to distinguish studies with strong evidence for clinical care has the potential to decrease the workload of manually identifying these. The evidence base is active and evolving. Reported methods were variable across the studies but focused on supervised machine learning approaches. Performance may improve by applying more sophisticated approaches such as active learning, auto-machine learning, and unsupervised machine learning approaches.

Download Full-text

Comparison of Implicit vs. Explicit Regime Identification in Machine Learning Methods for Solar Irradiance Prediction

Energies ◽

10.3390/en13030689 ◽

2020 ◽

Vol 13 (3) ◽

pp. 689 ◽

Cited By ~ 6

Author(s):

Tyler McCandless ◽

Susan Dettling ◽

Sue Ellen Haupt

Keyword(s):

Machine Learning ◽

Solar Power ◽

Network Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Validation Dataset ◽

Prediction Errors ◽

Learning Approaches ◽

Power Prediction ◽

Neural Network Models

This work compares the solar power forecasting performance of tree-based methods that include implicit regime-based models to explicit regime separation methods that utilize both unsupervised and supervised machine learning techniques. Previous studies have shown an improvement utilizing a regime-based machine learning approach in a climate with diverse cloud conditions. This study compares the machine learning approaches for solar power prediction at the Shagaya Renewable Energy Park in Kuwait, which is in an arid desert climate characterized by abundant sunshine. The regime-dependent artificial neural network models undergo a comprehensive parameter and hyperparameter tuning analysis to minimize the prediction errors on a test dataset. The final results that compare the different methods are computed on an independent validation dataset. The results show that the tree-based methods, the regression model tree approach, performs better than the explicit regime-dependent approach. These results appear to be a function of the predominantly sunny conditions that limit the ability of an unsupervised technique to separate regimes for which the relationship between the predictors and the predictand would differ for the supervised learning technique.

Download Full-text

Extracting Hidden Patterns Within Road Accident Data Using Machine Learning Techniques

Information and Communication Technology - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-10-5508-9_2 ◽

2017 ◽

pp. 13-22 ◽

Cited By ~ 1

Author(s):

S. Vasavi

Keyword(s):

Machine Learning ◽

Road Accident ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Accident Data ◽

Hidden Patterns

Download Full-text

Computational Statistics and Machine Learning Techniques for Effective Decision Making on Student’s Employment for Real-Time

Mathematics ◽

10.3390/math9111166 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1166

Author(s):

Deepak Kumar ◽

Chaman Verma ◽

Pradeep Kumar Singh ◽

Maria Simona Raboaca ◽

Raluca-Andreea Felseghi ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Approach ◽

T Test ◽

Job Placement ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Computational Statistics ◽

The Impact

The present study accentuated a hybrid approach to evaluate the impact, association and discrepancies of demographic characteristics on a student’s job placement. The present study extracted several significant academic features that determine the Master of Business Administration (MBA) student placement and confirm the placed gender. This paper recommended a novel futuristic roadmap for students, parents, guardians, institutions, and companies to benefit at a certain level. Out of seven experiments, the first five experiments were conducted with deep statistical computations, and the last two experiments were performed with supervised machine learning approaches. On the one hand, the Support Vector Machine (SVM) outperformed others with the uppermost accuracy of 90% to predict the employment status. On the other hand, the Random Forest (RF) attained a maximum accuracy of 88% to recognize the gender of placed students. Further, several significant features are also recommended to identify the placement of gender and placement status. A statistical t-test at 0.05 significance level proved that the student’s gender did not influence their offered salary during job placement and MBA specializations Marketing and Finance (Mkt&Fin) and Marketing and Human Resource (Mkt&HR) (p > 0.05). Additionally, the result of the t-test also showed that gender did not affect student’s placement test percentage scores (p > 0.05) and degree streams such as Science and Technology (Sci&Tech), Commerce and Management (Comm&Mgmt). Others did not affect the offered salary (p > 0.05). Further, the χ2 test revealed a significant association between a student’s course specialization and student’s placement status (p < 0.05). It also proved that there is no significant association between a student’s degree and placement status (p > 0.05). The current study recommended automatic placement prediction with demographic impact identification for the higher educational universities and institutions that will help human communities (students, teachers, parents, institutions) to prepare for the future accordingly.

Download Full-text

EVALUATION OF MACHINE LEARNING TECHNIQUES IN VINE LEAVES DISEASE DETECTION: A PRELIMINARY CASE STUDY ON FLAVESCENCE DORÉE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w8-151-2019 ◽

2019 ◽

Vol XLII-3/W8 ◽

pp. 151-156

Author(s):

J. Hruška ◽

T. Adão ◽

L. Pádua ◽

N. Guimarães ◽

E. Peres ◽

...

Keyword(s):

Machine Learning ◽

Ground Truth ◽

Machine Learning Techniques ◽

Economic Losses ◽

Learning Approaches ◽

Laboratory Methods ◽

Flavescence Dorée ◽

Damage Propagation ◽

Learning Techniques

<p><strong>Abstract.</strong> Vine culture is influenced by many factors, such as the weather, soil or topography, which are triggers to phytosanitary issues. Among them are some diseases, that are responsible for major economic losses that can, however, be managed with timely interventions in the field, viable of leading to effective results by preventing damage propagation. While not all symptoms might present a visible evidence, hyperspectral sensors can tackle this aspect with their ability for measuring hundreds of continuously sparse bands that range beyond the eye-perceptible spectrum. Having such research line in mind in this work, a hyperspectral sensor was applied to analyse the spectral status of vine leaves samples, collected in three chronologically distinct campaigns, while costly and destructive laboratory methods were used to track Flavescence Dorée (FD) in the same samples, for a ground truth information. Regarding data processing, machine learning approaches were used, in which several classifiers were selected to detect FD in vine leaves hyperspectral images. The goal was to evaluate and find most suitable classifier for this task.</p>

Download Full-text

A Survey on Extracting Hidden Patterns within Road Accident Data using Machine Learning Techniques

Communications on Applied Electronics ◽

10.5120/cae2016652455 ◽

2016 ◽

Vol 6 (4) ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

S. Vasavi

Keyword(s):

Machine Learning ◽

Road Accident ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Accident Data ◽

Hidden Patterns

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text