scholarly journals A Road Mishaps Analysis using Decision Tree and Random Forest Algorithms

AI (ML) is the investigation of calculations and factual models that PC frameworks use to play out a particular activity without utilizing guidelines and depending on designs. It is communicated as subset of man-made brainpower. In this, the sample data is split into test set and the training set. Major drawback for the deaths in world is recorded by the road accidents. Most of the deaths are occurred in the middle-income countries. These studies result in finding the major factors for road accidents using decision tree and random forests. Decision tree is a choice help device that is a like a tree model which contains just control explanations. Random forest corrects the decision tree for overfitting to their training set. In this, the decision tree and the random forest algorithms are used to find the severity and the factors for the road-accidents using driver’s personal information. Results conclude that the possibilities for the road accidents using the machine learning algorithms.

2020 ◽  
Vol 23 (4) ◽  
pp. 49-55
Author(s):  
Reham Jehad ◽  
◽  
Suhad A.Yousif ◽  

FakeNews is one of the most popular phenomena that have considerable effects on our social life, especially in the political domain. Nowadays, creating fake news becomes very easy because of users' widespread using the internet and social media. Therefore, the detection of elusiveness news is a crucial problem that needs to be considerable mainly because of its challenges like the limited amount of the benchmark datasets and the amount of the published news every second. This research proposed utilizing two different machine learning algorithms (random forest and decision tree (J48)) to detect the fake news. In this paper, the full dataset size equals 20,761 samples, while the testing sample size equals 4,345 samples.The preprocessing steps start with cleaning data by removing unnecessary special characters, numbers, English letters, and white spaces, and finally, removing stop words is implemented. After that, the most popular feature extraction method (TF-IDF) is used before applying the two suggested classification algorithms. The results show that the best accuracy achieved equals 89.11% using the decision tree model while using the random forest; the accuracy achieved equals 84.97 %.


2021 ◽  
Vol 13 (11) ◽  
pp. 2040
Author(s):  
Xin Yan ◽  
Hua Chen ◽  
Bingru Tian ◽  
Sheng Sheng ◽  
Jinxing Wang ◽  
...  

High-spatial-resolution precipitation data are of great significance in many applications, such as ecology, hydrology, and meteorology. Acquiring high-precision and high-resolution precipitation data in a large area is still a great challenge. In this study, a downscaling–merging scheme based on random forest and cokriging is presented to solve this problem. First, the enhanced decision tree model, which is based on random forest from machine learning algorithms, is used to reduce the spatial resolution of satellite daily precipitation data to 0.01°. The downscaled satellite-based daily precipitation is then merged with gauge observations using the cokriging method. The scheme is applied to downscale the Global Precipitation Measurement Mission (GPM) daily precipitation product over the upstream part of the Hanjiang Basin. The experimental results indicate that (1) the downscaling model based on random forest can correctly spatially downscale the GPM daily precipitation data, which retains the accuracy of the original GPM data and greatly improves their spatial details; (2) the GPM precipitation data can be downscaled on the seasonal scale; and (3) the merging method based on cokriging greatly improves the accuracy of the downscaled GPM daily precipitation data. This study provides an efficient scheme for generating high-resolution and high-quality daily precipitation data in a large area.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Faizan Ullah ◽  
Qaisar Javaid ◽  
Abdu Salam ◽  
Masood Ahmad ◽  
Nadeem Sarwar ◽  
...  

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Peter Appiahene ◽  
Yaw Marfo Missah ◽  
Ussiph Najim

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1677
Author(s):  
Ersin Elbasi ◽  
Ahmet E. Topcu ◽  
Shinu Mathew

COVID-19 is a community-acquired infection with symptoms that resemble those of influenza and bacterial pneumonia. Creating an infection control policy involving isolation, disinfection of surfaces, and identification of contagions is crucial in eradicating such pandemics. Incorporating social distancing could also help stop the spread of community-acquired infections like COVID-19. Social distancing entails maintaining certain distances between people and reducing the frequency of contact between people. Meanwhile, a significant increase in the development of different Internet of Things (IoT) devices has been seen together with cyber-physical systems that connect with physical environments. Machine learning is strengthening current technologies by adding new approaches to quickly and correctly solve problems utilizing this surge of available IoT devices. We propose a new approach using machine learning algorithms for monitoring the risk of COVID-19 in public areas. Extracted features from IoT sensors are used as input for several machine learning algorithms such as decision tree, neural network, naïve Bayes classifier, support vector machine, and random forest to predict the risks of the COVID-19 pandemic and calculate the risk probability of public places. This research aims to find vulnerable populations and reduce the impact of the disease on certain groups using machine learning models. We build a model to calculate and predict the risk factors of populated areas. This model generates automated alerts for security authorities in the case of any abnormal detection. Experimental results show that we have high accuracy with random forest of 97.32%, with decision tree of 94.50%, and with the naïve Bayes classifier of 99.37%. These algorithms indicate great potential for crowd risk prediction in public areas.


2021 ◽  
Vol 17 (9) ◽  
pp. e1009336
Author(s):  
Sepideh Mazrouee ◽  
Susan J. Little ◽  
Joel O. Wertheim

HIV molecular epidemiology estimates the transmission patterns from clustering genetically similar viruses. The process involves connecting genetically similar genotyped viral sequences in the network implying epidemiological transmissions. This technique relies on genotype data which is collected only from HIV diagnosed and in-care populations and leaves many persons with HIV (PWH) who have no access to consistent care out of the tracking process. We use machine learning algorithms to learn the non-linear correlation patterns between patient metadata and transmissions between HIV-positive cases. This enables us to expand the transmission network reconstruction beyond the molecular network. We employed multiple commonly used supervised classification algorithms to analyze the San Diego Primary Infection Resource Consortium (PIRC) cohort dataset, consisting of genotypes and nearly 80 additional non-genetic features. First, we trained classification models to determine genetically unrelated individuals from related ones. Our results show that random forest and decision tree achieved over 80% in accuracy, precision, recall, and F1-score by only using a subset of meta-features including age, birth sex, sexual orientation, race, transmission category, estimated date of infection, and first viral load date besides genetic data. Additionally, both algorithms achieved approximately 80% sensitivity and specificity. The Area Under Curve (AUC) is reported 97% and 94% for random forest and decision tree classifiers respectively. Next, we extended the models to identify clusters of similar viral sequences. Support vector machine demonstrated one order of magnitude improvement in accuracy of assigning the sequences to the correct cluster compared to dummy uniform random classifier. These results confirm that metadata carries important information about the dynamics of HIV transmission as embedded in transmission clusters. Hence, novel computational approaches are needed to apply the non-trivial knowledge collected from inter-individual genetic information to metadata from PWH in order to expand the estimated transmissions. We note that feature extraction alone will not be effective in identifying patterns of transmission and will result in random clustering of the data, but its utilization in conjunction with genetic data and the right algorithm can contribute to the expansion of the reconstructed network beyond individuals with genetic data.


Author(s):  
Jiarui Yin ◽  
Inikuro Afa Michael ◽  
Iduabo John Afa

Machine learning plays a key role in present day crime detection, analysis and prediction. The goal of this work is to propose methods for predicting crimes classified into different categories of severity. We implemented visualization and analysis of crime data statistics in recent years in the city of Boston. We then carried out a comparative study between two supervised learning algorithms, which are decision tree and random forest based on the accuracy and processing time of the models to make predictions using geographical and temporal information provided by splitting the data into training and test sets. The result shows that random forest as expected gives a better result by 1.54% more accuracy in comparison to decision tree, although this comes at a cost of at least 4.37 times the time consumed in processing. The study opens doors to application of similar supervised methods in crime data analytics and other fields of data science


2021 ◽  
Vol 9 ◽  
Author(s):  
Shibao Wang ◽  
Jianqi Zhuang ◽  
Jia Zheng ◽  
Hongyu Fan ◽  
Jiaxu Kong ◽  
...  

Landslides are widely distributed worldwide and often result in tremendous casualties and economic losses, especially in the Loess Plateau of China. Taking Wuqi County in the hinterland of the Loess Plateau as the research area, using Bayesian hyperparameters to optimize random forest and extreme gradient boosting decision trees model for landslide susceptibility mapping, and the two optimized models are compared. In addition, 14 landslide influencing factors are selected, and 734 landslides are obtained according to field investigation and reports from literals. The landslides were randomly divided into training data (70%) and validation data (30%). The hyperparameters of the random forest and extreme gradient boosting decision tree models were optimized using a Bayesian algorithm, and then the optimal hyperparameters are selected for landslide susceptibility mapping. Both models were evaluated and compared using the receiver operating characteristic curve and confusion matrix. The results show that the AUC validation data of the Bayesian optimized random forest and extreme gradient boosting decision tree model are 0.88 and 0.86, respectively, which showed an improvement of 4 and 3%, indicating that the prediction performance of the two models has been improved. However, the random forest model has a higher predictive ability than the extreme gradient boosting decision tree model. Thus, hyperparameter optimization is of great significance in the improvement of the prediction accuracy of the model. Therefore, the optimized model can generate a high-quality landslide susceptibility map.


2019 ◽  
Vol 9 (14) ◽  
pp. 2789 ◽  
Author(s):  
Sadaf Malik ◽  
Nadia Kanwal ◽  
Mamoona Naveed Asghar ◽  
Mohammad Ali A. Sadiq ◽  
Irfan Karamat ◽  
...  

Medical health systems have been concentrating on artificial intelligence techniques for speedy diagnosis. However, the recording of health data in a standard form still requires attention so that machine learning can be more accurate and reliable by considering multiple features. The aim of this study is to develop a general framework for recording diagnostic data in an international standard format to facilitate prediction of disease diagnosis based on symptoms using machine learning algorithms. Efforts were made to ensure error-free data entry by developing a user-friendly interface. Furthermore, multiple machine learning algorithms including Decision Tree, Random Forest, Naive Bayes and Neural Network algorithms were used to analyze patient data based on multiple features, including age, illness history and clinical observations. This data was formatted according to structured hierarchies designed by medical experts, whereas diagnosis was made as per the ICD-10 coding developed by the American Academy of Ophthalmology. Furthermore, the system is designed to evolve through self-learning by adding new classifications for both diagnosis and symptoms. The classification results from tree-based methods demonstrated that the proposed framework performs satisfactorily, given a sufficient amount of data. Owing to a structured data arrangement, the random forest and decision tree algorithms’ prediction rate is more than 90% as compared to more complex methods such as neural networks and the naïve Bayes algorithm.


Author(s):  
Cumhur Aydin ◽  
Nura Balla

As a consequence of increasing traffic volume and mobility, road accidents have been a serious problem especially in low and middle-income countries. The number of road accidents in such countries tends to increase every year. Considering different contributing factors to the road accidents, road and its environment have played an important role. Road safety audits and road safety inspections have been worldwide used tools to monitor and to evaluate road projects and existing road sections from the safety perspective. In this study, through the evaluation of different safety auditing techniques applied in the world, a case study on a Nigerian Road Section has been implemented. The expectations from such a study are: (i) To show the main safety deficiencies of the Nigerian road sections; and (ii) To introduce a new tool to the local road authorities to further use it for monitoring their road sections. Based on this study, the audit report was prepared to summarize findings with possible countermeasures.


Sign in / Sign up

Export Citation Format

Share Document