A Road Mishaps Analysis using Decision Tree and Random Forest Algorithms

AI (ML) is the investigation of calculations and factual models that PC frameworks use to play out a particular activity without utilizing guidelines and depending on designs. It is communicated as subset of man-made brainpower. In this, the sample data is split into test set and the training set. Major drawback for the deaths in world is recorded by the road accidents. Most of the deaths are occurred in the middle-income countries. These studies result in finding the major factors for road accidents using decision tree and random forests. Decision tree is a choice help device that is a like a tree model which contains just control explanations. Random forest corrects the decision tree for overfitting to their training set. In this, the decision tree and the random forest algorithms are used to find the severity and the factors for the road-accidents using driver’s personal information. Results conclude that the possibilities for the road accidents using the machine learning algorithms.

Download Full-text

Fake News Classification Using Random Forest and Decision Tree (J48)

Al-Nahrain Journal of Science ◽

10.22401/anjs.23.4.09 ◽

2020 ◽

Vol 23 (4) ◽

pp. 49-55

Author(s):

Reham Jehad ◽

◽

Suhad A.Yousif ◽

Keyword(s):

Random Forest ◽

Decision Tree ◽

Social Life ◽

Machine Learning Algorithms ◽

Fake News ◽

Tree Model ◽

Feature Extraction Method ◽

Full Dataset ◽

Dataset Size ◽

Benchmark Datasets

FakeNews is one of the most popular phenomena that have considerable effects on our social life, especially in the political domain. Nowadays, creating fake news becomes very easy because of users' widespread using the internet and social media. Therefore, the detection of elusiveness news is a crucial problem that needs to be considerable mainly because of its challenges like the limited amount of the benchmark datasets and the amount of the published news every second. This research proposed utilizing two different machine learning algorithms (random forest and decision tree (J48)) to detect the fake news. In this paper, the full dataset size equals 20,761 samples, while the testing sample size equals 4,345 samples.The preprocessing steps start with cleaning data by removing unnecessary special characters, numbers, English letters, and white spaces, and finally, removing stop words is implemented. After that, the most popular feature extraction method (TF-IDF) is used before applying the two suggested classification algorithms. The results show that the best accuracy achieved equals 89.11% using the decision tree model while using the random forest; the accuracy achieved equals 84.97 %.

Download Full-text

A Downscaling–Merging Scheme for Improving Daily Spatial Precipitation Estimates Based on Random Forest and Cokriging

Remote Sensing ◽

10.3390/rs13112040 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2040

Author(s):

Xin Yan ◽

Hua Chen ◽

Bingru Tian ◽

Sheng Sheng ◽

Jinxing Wang ◽

...

Keyword(s):

High Resolution ◽

Random Forest ◽

Spatial Resolution ◽

Daily Precipitation ◽

Machine Learning Algorithms ◽

Precipitation Data ◽

Large Area ◽

Tree Model ◽

Daily Precipitation Data ◽

Precipitation Estimates

High-spatial-resolution precipitation data are of great significance in many applications, such as ecology, hydrology, and meteorology. Acquiring high-precision and high-resolution precipitation data in a large area is still a great challenge. In this study, a downscaling–merging scheme based on random forest and cokriging is presented to solve this problem. First, the enhanced decision tree model, which is based on random forest from machine learning algorithms, is used to reduce the spatial resolution of satellite daily precipitation data to 0.01°. The downscaled satellite-based daily precipitation is then merged with gauge observations using the cokriging method. The scheme is applied to downscale the Global Precipitation Measurement Mission (GPM) daily precipitation product over the upstream part of the Hanjiang Basin. The experimental results indicate that (1) the downscaling model based on random forest can correctly spatially downscale the GPM daily precipitation data, which retains the accuracy of the original GPM data and greatly improves their spatial details; (2) the GPM precipitation data can be downscaled on the seasonal scale; and (3) the merging method based on cokriging greatly improves the accuracy of the downscaled GPM daily precipitation data. This study provides an efficient scheme for generating high-resolution and high-quality daily precipitation data in a large area.

Download Full-text

Modified Decision Tree Technique for Ransomware Detection at Runtime through API Calls

Scientific Programming ◽

10.1155/2020/8845833 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Faizan Ullah ◽

Qaisar Javaid ◽

Abdu Salam ◽

Masood Ahmad ◽

Nadeem Sarwar ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Feature Vector ◽

Machine Learning Algorithms ◽

The Novel ◽

Proposed Model ◽

Testing Accuracy ◽

Financial Losses

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.

Download Full-text

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

Prediction of COVID-19 Risk in Public Areas Using IoT and Machine Learning

Electronics ◽

10.3390/electronics10141677 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1677

Author(s):

Ersin Elbasi ◽

Ahmet E. Topcu ◽

Shinu Mathew

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Bayes Classifier ◽

Social Distancing ◽

Public Areas ◽

Iot Devices

COVID-19 is a community-acquired infection with symptoms that resemble those of influenza and bacterial pneumonia. Creating an infection control policy involving isolation, disinfection of surfaces, and identification of contagions is crucial in eradicating such pandemics. Incorporating social distancing could also help stop the spread of community-acquired infections like COVID-19. Social distancing entails maintaining certain distances between people and reducing the frequency of contact between people. Meanwhile, a significant increase in the development of different Internet of Things (IoT) devices has been seen together with cyber-physical systems that connect with physical environments. Machine learning is strengthening current technologies by adding new approaches to quickly and correctly solve problems utilizing this surge of available IoT devices. We propose a new approach using machine learning algorithms for monitoring the risk of COVID-19 in public areas. Extracted features from IoT sensors are used as input for several machine learning algorithms such as decision tree, neural network, naïve Bayes classifier, support vector machine, and random forest to predict the risks of the COVID-19 pandemic and calculate the risk probability of public places. This research aims to find vulnerable populations and reduce the impact of the disease on certain groups using machine learning models. We build a model to calculate and predict the risk factors of populated areas. This model generates automated alerts for security authorities in the case of any abnormal detection. Experimental results show that we have high accuracy with random forest of 97.32%, with decision tree of 94.50%, and with the naïve Bayes classifier of 99.37%. These algorithms indicate great potential for crowd risk prediction in public areas.

Download Full-text

Incorporating metadata in HIV transmission network reconstruction: A machine learning feasibility assessment

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009336 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009336

Author(s):

Sepideh Mazrouee ◽

Susan J. Little ◽

Joel O. Wertheim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Hiv Transmission ◽

Genetic Data ◽

Network Reconstruction ◽

Machine Learning Algorithms ◽

Support Vector ◽

Transmission Network ◽

Viral Sequences

HIV molecular epidemiology estimates the transmission patterns from clustering genetically similar viruses. The process involves connecting genetically similar genotyped viral sequences in the network implying epidemiological transmissions. This technique relies on genotype data which is collected only from HIV diagnosed and in-care populations and leaves many persons with HIV (PWH) who have no access to consistent care out of the tracking process. We use machine learning algorithms to learn the non-linear correlation patterns between patient metadata and transmissions between HIV-positive cases. This enables us to expand the transmission network reconstruction beyond the molecular network. We employed multiple commonly used supervised classification algorithms to analyze the San Diego Primary Infection Resource Consortium (PIRC) cohort dataset, consisting of genotypes and nearly 80 additional non-genetic features. First, we trained classification models to determine genetically unrelated individuals from related ones. Our results show that random forest and decision tree achieved over 80% in accuracy, precision, recall, and F1-score by only using a subset of meta-features including age, birth sex, sexual orientation, race, transmission category, estimated date of infection, and first viral load date besides genetic data. Additionally, both algorithms achieved approximately 80% sensitivity and specificity. The Area Under Curve (AUC) is reported 97% and 94% for random forest and decision tree classifiers respectively. Next, we extended the models to identify clusters of similar viral sequences. Support vector machine demonstrated one order of magnitude improvement in accuracy of assigning the sequences to the correct cluster compared to dummy uniform random classifier. These results confirm that metadata carries important information about the dynamics of HIV transmission as embedded in transmission clusters. Hence, novel computational approaches are needed to apply the non-trivial knowledge collected from inter-individual genetic information to metadata from PWH in order to expand the estimated transmissions. We note that feature extraction alone will not be effective in identifying patterns of transmission and will result in random clustering of the data, but its utilization in conjunction with genetic data and the right algorithm can contribute to the expansion of the reconstructed network beyond individuals with genetic data.

Download Full-text

Machine Learning Algorithms for Visualization and Prediction Modeling of Boston Crime Data

10.20944/preprints202002.0108.v1 ◽

2020 ◽

Author(s):

Jiarui Yin ◽

Inikuro Afa Michael ◽

Iduabo John Afa

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Crime Data ◽

Detection Analysis ◽

Supervised Learning Algorithms ◽

Supervised Methods

Machine learning plays a key role in present day crime detection, analysis and prediction. The goal of this work is to propose methods for predicting crimes classified into different categories of severity. We implemented visualization and analysis of crime data statistics in recent years in the city of Boston. We then carried out a comparative study between two supervised learning algorithms, which are decision tree and random forest based on the accuracy and processing time of the models to make predictions using geographical and temporal information provided by splitting the data into training and test sets. The result shows that random forest as expected gives a better result by 1.54% more accuracy in comparison to decision tree, although this comes at a cost of at least 4.37 times the time consumed in processing. The study opens doors to application of similar supervised methods in crime data analytics and other fields of data science

Download Full-text

Application of Bayesian Hyperparameter Optimized Random Forest and XGBoost Model for Landslide Susceptibility Mapping

Frontiers in Earth Science ◽

10.3389/feart.2021.712240 ◽

2021 ◽

Vol 9 ◽

Author(s):

Shibao Wang ◽

Jianqi Zhuang ◽

Jia Zheng ◽

Hongyu Fan ◽

Jiaxu Kong ◽

...

Keyword(s):

Random Forest ◽

Decision Tree ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Gradient Boosting ◽

The Loess Plateau ◽

Tree Model ◽

Validation Data ◽

Extreme Gradient Boosting

Landslides are widely distributed worldwide and often result in tremendous casualties and economic losses, especially in the Loess Plateau of China. Taking Wuqi County in the hinterland of the Loess Plateau as the research area, using Bayesian hyperparameters to optimize random forest and extreme gradient boosting decision trees model for landslide susceptibility mapping, and the two optimized models are compared. In addition, 14 landslide influencing factors are selected, and 734 landslides are obtained according to field investigation and reports from literals. The landslides were randomly divided into training data (70%) and validation data (30%). The hyperparameters of the random forest and extreme gradient boosting decision tree models were optimized using a Bayesian algorithm, and then the optimal hyperparameters are selected for landslide susceptibility mapping. Both models were evaluated and compared using the receiver operating characteristic curve and confusion matrix. The results show that the AUC validation data of the Bayesian optimized random forest and extreme gradient boosting decision tree model are 0.88 and 0.86, respectively, which showed an improvement of 4 and 3%, indicating that the prediction performance of the two models has been improved. However, the random forest model has a higher predictive ability than the extreme gradient boosting decision tree model. Thus, hyperparameter optimization is of great significance in the improvement of the prediction accuracy of the model. Therefore, the optimized model can generate a high-quality landslide susceptibility map.

Download Full-text

Data Driven Approach for Eye Disease Classification with Machine Learning

Applied Sciences ◽

10.3390/app9142789 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2789 ◽

Cited By ~ 3

Author(s):

Sadaf Malik ◽

Nadia Kanwal ◽

Mamoona Naveed Asghar ◽

Mohammad Ali A. Sadiq ◽

Irfan Karamat ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Multiple Features ◽

Standard Format ◽

Free Data

Medical health systems have been concentrating on artificial intelligence techniques for speedy diagnosis. However, the recording of health data in a standard form still requires attention so that machine learning can be more accurate and reliable by considering multiple features. The aim of this study is to develop a general framework for recording diagnostic data in an international standard format to facilitate prediction of disease diagnosis based on symptoms using machine learning algorithms. Efforts were made to ensure error-free data entry by developing a user-friendly interface. Furthermore, multiple machine learning algorithms including Decision Tree, Random Forest, Naive Bayes and Neural Network algorithms were used to analyze patient data based on multiple features, including age, illness history and clinical observations. This data was formatted according to structured hierarchies designed by medical experts, whereas diagnosis was made as per the ICD-10 coding developed by the American Academy of Ophthalmology. Furthermore, the system is designed to evolve through self-learning by adding new classifications for both diagnosis and symptoms. The classification results from tree-based methods demonstrated that the proposed framework performs satisfactorily, given a sufficient amount of data. Owing to a structured data arrangement, the random forest and decision tree algorithms’ prediction rate is more than 90% as compared to more complex methods such as neural networks and the naïve Bayes algorithm.

Download Full-text

ROAD SAFETY AUDIT: A CASE STUDY

Proceedings of International Structural Engineering and Construction ◽

10.14455/isec.res.2017.124 ◽

2017 ◽

Vol 4 (1) ◽

Cited By ~ 1

Author(s):

Cumhur Aydin ◽

Nura Balla

Keyword(s):

Road Safety ◽

Road Accidents ◽

Contributing Factors ◽

Middle Income ◽

The Road ◽

Middle Income Countries ◽

Road Projects ◽

Road Safety Audit ◽

Road Safety Audits

As a consequence of increasing traffic volume and mobility, road accidents have been a serious problem especially in low and middle-income countries. The number of road accidents in such countries tends to increase every year. Considering different contributing factors to the road accidents, road and its environment have played an important role. Road safety audits and road safety inspections have been worldwide used tools to monitor and to evaluate road projects and existing road sections from the safety perspective. In this study, through the evaluation of different safety auditing techniques applied in the world, a case study on a Nigerian Road Section has been implemented. The expectations from such a study are: (i) To show the main safety deficiencies of the Nigerian road sections; and (ii) To introduce a new tool to the local road authorities to further use it for monitoring their road sections. Based on this study, the audit report was prepared to summarize findings with possible countermeasures.

Download Full-text