Modified Decision Tree Technique for Ransomware Detection at Runtime through API Calls

Scientific Programming ◽

10.1155/2020/8845833 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Faizan Ullah ◽

Qaisar Javaid ◽

Abdu Salam ◽

Masood Ahmad ◽

Nadeem Sarwar ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Feature Vector ◽

Machine Learning Algorithms ◽

The Novel ◽

Proposed Model ◽

Testing Accuracy ◽

Financial Losses

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.

Get full-text (via PubEx)

174 A comparison of machine learning algorithms in the classification of beef steers finished in feedlot

Journal of Animal Science ◽

10.1093/jas/skaa278.231 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 126-127

Author(s):

Lucas S Lopes ◽

Christine F Baes ◽

Dan Tulpan ◽

Luis Artur Loyola Chardulo ◽

Otavio Machado Neto ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Final Decision ◽

Relevant Parameter ◽

Good Prediction ◽

Quality Traits ◽

C4.5 Decision Tree

Abstract The aim of this project is to compare some of the state-of-the-art machine learning algorithms on the classification of steers finished in feedlots based on performance, carcass and meat quality traits. The precise classification of animals allows for fast, real-time decision making in animal food industry, such as culling or retention of herd animals. Beef production presents high variability in its numerous carcass and beef quality traits. Machine learning algorithms and software provide an opportunity to evaluate the interactions between traits to better classify animals. Four different treatment levels of wet distiller’s grain were applied to 97 Angus-Nellore animals and used as features for the classification problem. The C4.5 decision tree, Naïve Bayes (NB), Random Forest (RF) and Multilayer Perceptron (MLP) Artificial Neural Network algorithms were used to predict and classify the animals based on recorded traits measurements, which include initial and final weights, sheer force and meat color. The top performing classifier was the C4.5 decision tree algorithm with a classification accuracy of 96.90%, while the RF, the MLP and NB classifiers had accuracies of 55.67%, 39.17% and 29.89% respectively. We observed that the final decision tree model constructed with C4.5 selected only the dry matter intake (DMI) feature as a differentiator. When DMI was removed, no other feature or combination of features was sufficiently strong to provide good prediction accuracies for any of the classifiers. We plan to investigate in a follow-up study on a significantly larger sample size, the reasons behind DMI being a more relevant parameter than the other measurements.

Get full-text (via PubEx)

Damage Classification of Composites Using Machine Learning

Volume 13: Safety Engineering, Risk, and Reliability Analysis ◽

10.1115/imece2019-11851 ◽

2019 ◽

Author(s):

Shweta Dabetwar ◽

Stephen Ekwaro-Osire ◽

João Paulo Dias

Keyword(s):

Machine Learning ◽

Composite Materials ◽

Random Forest ◽

Condition Monitoring ◽

Machine Learning Algorithms ◽

Support Vector ◽

Damage Classification ◽

Combining Data ◽

Ultrasonic Measurements

Abstract Composite materials have tremendous and ever-increasing applications in complex engineering systems; thus, it is important to develop non-destructive and efficient condition monitoring methods to improve damage prediction, thereby avoiding catastrophic failures and reducing standby time. Nondestructive condition monitoring techniques when combined with machine learning applications can contribute towards the stated improvements. Thus, the research question taken into consideration for this paper is “Can machine learning techniques provide efficient damage classification of composite materials to improve condition monitoring using features extracted from acousto-ultrasonic measurements?” In order to answer this question, acoustic-ultrasonic signals in Carbon Fiber Reinforced Polymer (CFRP) composites for distinct damage levels were taken from NASA Ames prognostics data repository. Statistical condition indicators of the signals were used as features to train and test four traditional machine learning algorithms such as K-nearest neighbors, support vector machine, Decision Tree and Random Forest, and their performance was compared and discussed. Results showed higher accuracy for Random Forest with a strong dependency on the feature extraction/selection techniques employed. By combining data analysis from acoustic-ultrasonic measurements in composite materials with machine learning tools, this work contributes to the development of intelligent damage classification algorithms that can be applied to advanced online diagnostics and health management strategies of composite materials, operating under more complex working conditions.

Get full-text (via PubEx)

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Get full-text (via PubEx)

Prediction of COVID-19 Risk in Public Areas Using IoT and Machine Learning

Electronics ◽

10.3390/electronics10141677 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1677

Author(s):

Ersin Elbasi ◽

Ahmet E. Topcu ◽

Shinu Mathew

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Bayes Classifier ◽

Social Distancing ◽

Public Areas ◽

Iot Devices

COVID-19 is a community-acquired infection with symptoms that resemble those of influenza and bacterial pneumonia. Creating an infection control policy involving isolation, disinfection of surfaces, and identification of contagions is crucial in eradicating such pandemics. Incorporating social distancing could also help stop the spread of community-acquired infections like COVID-19. Social distancing entails maintaining certain distances between people and reducing the frequency of contact between people. Meanwhile, a significant increase in the development of different Internet of Things (IoT) devices has been seen together with cyber-physical systems that connect with physical environments. Machine learning is strengthening current technologies by adding new approaches to quickly and correctly solve problems utilizing this surge of available IoT devices. We propose a new approach using machine learning algorithms for monitoring the risk of COVID-19 in public areas. Extracted features from IoT sensors are used as input for several machine learning algorithms such as decision tree, neural network, naïve Bayes classifier, support vector machine, and random forest to predict the risks of the COVID-19 pandemic and calculate the risk probability of public places. This research aims to find vulnerable populations and reduce the impact of the disease on certain groups using machine learning models. We build a model to calculate and predict the risk factors of populated areas. This model generates automated alerts for security authorities in the case of any abnormal detection. Experimental results show that we have high accuracy with random forest of 97.32%, with decision tree of 94.50%, and with the naïve Bayes classifier of 99.37%. These algorithms indicate great potential for crowd risk prediction in public areas.

Get full-text (via PubEx)

Incorporating metadata in HIV transmission network reconstruction: A machine learning feasibility assessment

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009336 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009336

Author(s):

Sepideh Mazrouee ◽

Susan J. Little ◽

Joel O. Wertheim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Hiv Transmission ◽

Genetic Data ◽

Network Reconstruction ◽

Machine Learning Algorithms ◽

Support Vector ◽

Transmission Network ◽

Viral Sequences

HIV molecular epidemiology estimates the transmission patterns from clustering genetically similar viruses. The process involves connecting genetically similar genotyped viral sequences in the network implying epidemiological transmissions. This technique relies on genotype data which is collected only from HIV diagnosed and in-care populations and leaves many persons with HIV (PWH) who have no access to consistent care out of the tracking process. We use machine learning algorithms to learn the non-linear correlation patterns between patient metadata and transmissions between HIV-positive cases. This enables us to expand the transmission network reconstruction beyond the molecular network. We employed multiple commonly used supervised classification algorithms to analyze the San Diego Primary Infection Resource Consortium (PIRC) cohort dataset, consisting of genotypes and nearly 80 additional non-genetic features. First, we trained classification models to determine genetically unrelated individuals from related ones. Our results show that random forest and decision tree achieved over 80% in accuracy, precision, recall, and F1-score by only using a subset of meta-features including age, birth sex, sexual orientation, race, transmission category, estimated date of infection, and first viral load date besides genetic data. Additionally, both algorithms achieved approximately 80% sensitivity and specificity. The Area Under Curve (AUC) is reported 97% and 94% for random forest and decision tree classifiers respectively. Next, we extended the models to identify clusters of similar viral sequences. Support vector machine demonstrated one order of magnitude improvement in accuracy of assigning the sequences to the correct cluster compared to dummy uniform random classifier. These results confirm that metadata carries important information about the dynamics of HIV transmission as embedded in transmission clusters. Hence, novel computational approaches are needed to apply the non-trivial knowledge collected from inter-individual genetic information to metadata from PWH in order to expand the estimated transmissions. We note that feature extraction alone will not be effective in identifying patterns of transmission and will result in random clustering of the data, but its utilization in conjunction with genetic data and the right algorithm can contribute to the expansion of the reconstructed network beyond individuals with genetic data.

Get full-text (via PubEx)

Machine Learning Algorithms for Visualization and Prediction Modeling of Boston Crime Data

10.20944/preprints202002.0108.v1 ◽

2020 ◽

Author(s):

Jiarui Yin ◽

Inikuro Afa Michael ◽

Iduabo John Afa

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Data Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Crime Data ◽

Detection Analysis ◽

Supervised Learning Algorithms ◽

Supervised Methods

Machine learning plays a key role in present day crime detection, analysis and prediction. The goal of this work is to propose methods for predicting crimes classified into different categories of severity. We implemented visualization and analysis of crime data statistics in recent years in the city of Boston. We then carried out a comparative study between two supervised learning algorithms, which are decision tree and random forest based on the accuracy and processing time of the models to make predictions using geographical and temporal information provided by splitting the data into training and test sets. The result shows that random forest as expected gives a better result by 1.54% more accuracy in comparison to decision tree, although this comes at a cost of at least 4.37 times the time consumed in processing. The study opens doors to application of similar supervised methods in crime data analytics and other fields of data science

Get full-text (via PubEx)

Automatic Classification of Hypertension Types Based on Personal Features by Machine Learning Algorithms

Mathematical Problems in Engineering ◽

10.1155/2020/2742781 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Majid Nour ◽

Kemal Polat

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Random Forest ◽

Decision Tree ◽

Systolic Blood Pressure ◽

Diastolic Blood Pressure ◽

Decision Tree Classifier ◽

Tree Classifier ◽

C4.5 Decision Tree

Hypertension (high blood pressure) is an important disease seen among the public, and early detection of hypertension is significant for early treatment. Hypertension is depicted as systolic blood pressure higher than 140 mmHg or diastolic blood pressure higher than 90 mmHg. In this paper, in order to detect the hypertension types based on the personal information and features, four machine learning (ML) methods including C4.5 decision tree classifier (DTC), random forest, linear discriminant analysis (LDA), and linear support vector machine (LSVM) have been used and then compared with each other. In the literature, we have first carried out the classification of hypertension types using classification algorithms based on personal data. To further explain the variability of the classifier type, four different classifier algorithms were selected for solving this problem. In the hypertension dataset, there are eight features including sex, age, height (cm), weight (kg), systolic blood pressure (mmHg), diastolic blood pressure (mmHg), heart rate (bpm), and BMI (kg/m2) to explain the hypertension status and then there are four classes comprising the normal (healthy), prehypertension, stage-1 hypertension, and stage-2 hypertension. In the classification of the hypertension dataset, the obtained classification accuracies are 99.5%, 99.5%, 96.3%, and 92.7% using the C4.5 decision tree classifier, random forest, LDA, and LSVM. The obtained results have shown that ML methods could be confidently used in the automatic determination of the hypertension types.

Get full-text (via PubEx)

Data Driven Approach for Eye Disease Classification with Machine Learning

Applied Sciences ◽

10.3390/app9142789 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2789 ◽

Cited By ~ 3

Author(s):

Sadaf Malik ◽

Nadia Kanwal ◽

Mamoona Naveed Asghar ◽

Mohammad Ali A. Sadiq ◽

Irfan Karamat ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Multiple Features ◽

Standard Format ◽

Free Data

Medical health systems have been concentrating on artificial intelligence techniques for speedy diagnosis. However, the recording of health data in a standard form still requires attention so that machine learning can be more accurate and reliable by considering multiple features. The aim of this study is to develop a general framework for recording diagnostic data in an international standard format to facilitate prediction of disease diagnosis based on symptoms using machine learning algorithms. Efforts were made to ensure error-free data entry by developing a user-friendly interface. Furthermore, multiple machine learning algorithms including Decision Tree, Random Forest, Naive Bayes and Neural Network algorithms were used to analyze patient data based on multiple features, including age, illness history and clinical observations. This data was formatted according to structured hierarchies designed by medical experts, whereas diagnosis was made as per the ICD-10 coding developed by the American Academy of Ophthalmology. Furthermore, the system is designed to evolve through self-learning by adding new classifications for both diagnosis and symptoms. The classification results from tree-based methods demonstrated that the proposed framework performs satisfactorily, given a sufficient amount of data. Owing to a structured data arrangement, the random forest and decision tree algorithms’ prediction rate is more than 90% as compared to more complex methods such as neural networks and the naïve Bayes algorithm.

Get full-text (via PubEx)

A Machine Learning-Based Identification of Genes Affecting the Pharmacokinetics of Tacrolimus Using the DMETTM Plus Platform

International Journal of Molecular Sciences ◽

10.3390/ijms21072517 ◽

2020 ◽

Vol 21 (7) ◽

pp. 2517

Author(s):

Jeong-An Gim ◽

Yonghan Kwon ◽

Hyun A Lee ◽

Kyeong-Ryoon Lee ◽

Soohyun Kim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Therapeutic Index ◽

Immunosuppressive Drug ◽

Machine Learning Algorithms ◽

Nucleotide Polymorphisms ◽

Lasso Regression ◽

Single Nucleotide ◽

Selection Operator

Tacrolimus is an immunosuppressive drug with a narrow therapeutic index and larger interindividual variability. We identified genetic variants to predict tacrolimus exposure in healthy Korean males using machine learning algorithms such as decision tree, random forest, and least absolute shrinkage and selection operator (LASSO) regression. rs776746 (CYP3A5) and rs1137115 (CYP2A6) are single nucleotide polymorphisms (SNPs) that can affect exposure to tacrolimus. A decision tree, when coupled with random forest analysis, is an efficient tool for predicting the exposure to tacrolimus based on genotype. These tools are helpful to determine an individualized dose of tacrolimus.

Get full-text (via PubEx)

Classification of Agriculture Farm Machinery Using Machine Learning and Internet of Things

Symmetry ◽

10.3390/sym13030403 ◽

2021 ◽

Vol 13 (3) ◽

pp. 403

Author(s):

Muhammad Waleed ◽

Tai-Won Um ◽

Tariq Kamal ◽

Syed Muhammad Usman

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Farm Machinery ◽

Learning Techniques

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.

Get full-text (via PubEx)