Benchmark-Based Reference Model for Evaluating Botnet Detection Tools Driven by Traffic-Flow Analytics

Botnets are some of the most recurrent cyber-threats, which take advantage of the wide heterogeneity of endpoint devices at the Edge of the emerging communication environments for enabling the malicious enforcement of fraud and other adversarial tactics, including malware, data leaks or denial of service. There have been significant research advances in the development of accurate botnet detection methods underpinned on supervised analysis but assessing the accuracy and performance of such detection methods requires a clear evaluation model in the pursuit of enforcing proper defensive strategies. In order to contribute to the mitigation of botnets, this paper introduces a novel evaluation scheme grounded on supervised machine learning algorithms that enable the detection and discrimination of different botnets families on real operational environments. The proposal relies on observing, understanding and inferring the behavior of each botnet family based on network indicators measured at flow-level. The assumed evaluation methodology contemplates six phases that allow building a detection model against botnet-related malware distributed through the network, for which five supervised classifiers were instantiated were instantiated for further comparisons—Decision Tree, Random Forest, Naive Bayes Gaussian, Support Vector Machine and K-Neighbors. The experimental validation was performed on two public datasets of real botnet traffic—CIC-AWS-2018 and ISOT HTTP Botnet. Bearing the heterogeneity of the datasets, optimizing the analysis with the Grid Search algorithm led to improve the classification results of the instantiated algorithms. An exhaustive evaluation was carried out demonstrating the adequateness of our proposal which prompted that Random Forest and Decision Tree models are the most suitable for detecting different botnet specimens among the chosen algorithms. They exhibited higher precision rates whilst analyzing a large number of samples with less processing time. The variety of testing scenarios were deeply assessed and reported to set baseline results for future benchmark analysis targeted on flow-based behavioral patterns.

Download Full-text

Encrypted DNP3 Traffic Classification Using Supervised Machine Learning Algorithms

Machine Learning and Knowledge Extraction ◽

10.3390/make1010022 ◽

2019 ◽

Vol 1 (1) ◽

pp. 384-399 ◽

Cited By ~ 2

Author(s):

Thais de Toledo ◽

Nunzio Torrisi

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Smart Grids ◽

Learning Algorithms ◽

Electric Utility ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Communication Link

The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages by using support vector machine learning algorithms. The communication networks of smart grids are a important part of their infrastructure, so it is of critical importance to keep this communication secure and reliable. The main contribution of this paper is to compare the use of machine learning techniques to classify messages of the same protocol exchanged in encrypted tunnels. The study considers four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms: Decision tree, nearest-neighbor, support vector machine, and naive Bayes. The results obtained show that it is possible to extend a Peekaboo attack over multiple substations, using a decision tree learning algorithm, and to gather significant information from a system that communicates using encrypted DNP3 traffic.

Download Full-text

Detecting Real-Time Fall of Elderly People Using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39635 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1913-1918

Author(s):

Prathima P

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Elderly People ◽

Fall Detection ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

False Alarms ◽

Severe Injuries

Abstract: Fall is a significant national health issue for the elderly people, generally resulting in severe injuries when the person lies down on the floor over an extended period without any aid after experiencing a great fall. Thus, elders need to be cared very attentively. A supervised-machine learning based fall detection approach with accelerometer, gyroscope is devised. The system can detect falls by grouping different actions as fall or non-fall events and the care taker is alerted immediately as soon as the person falls. The public dataset SisFall with efficient class of features is used to identify fall. The Random Forest (RF) and Support Vector Machine (SVM) machine learning algorithms are employed to detect falls with lesser false alarms. The SVM algorithm obtain a highest accuracy of 99.23% than RF algorithm. Keywords: Fall detection, Machine learning, Supervised classification, Sisfall, Activities of daily living, Wearable sensors, Random Forest, Support Vector Machine

Download Full-text

Comparative Analysis of Network flow-based Botnet Detection Methods Using Supervised Machine Learning Algorithms

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/229952020 ◽

2020 ◽

Vol 9 (5) ◽

pp. 8498-8503

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Network Flow ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Detection Methods ◽

Botnet Detection

Download Full-text

Incorporating metadata in HIV transmission network reconstruction: A machine learning feasibility assessment

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009336 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009336

Author(s):

Sepideh Mazrouee ◽

Susan J. Little ◽

Joel O. Wertheim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Hiv Transmission ◽

Genetic Data ◽

Network Reconstruction ◽

Machine Learning Algorithms ◽

Support Vector ◽

Transmission Network ◽

Viral Sequences

HIV molecular epidemiology estimates the transmission patterns from clustering genetically similar viruses. The process involves connecting genetically similar genotyped viral sequences in the network implying epidemiological transmissions. This technique relies on genotype data which is collected only from HIV diagnosed and in-care populations and leaves many persons with HIV (PWH) who have no access to consistent care out of the tracking process. We use machine learning algorithms to learn the non-linear correlation patterns between patient metadata and transmissions between HIV-positive cases. This enables us to expand the transmission network reconstruction beyond the molecular network. We employed multiple commonly used supervised classification algorithms to analyze the San Diego Primary Infection Resource Consortium (PIRC) cohort dataset, consisting of genotypes and nearly 80 additional non-genetic features. First, we trained classification models to determine genetically unrelated individuals from related ones. Our results show that random forest and decision tree achieved over 80% in accuracy, precision, recall, and F1-score by only using a subset of meta-features including age, birth sex, sexual orientation, race, transmission category, estimated date of infection, and first viral load date besides genetic data. Additionally, both algorithms achieved approximately 80% sensitivity and specificity. The Area Under Curve (AUC) is reported 97% and 94% for random forest and decision tree classifiers respectively. Next, we extended the models to identify clusters of similar viral sequences. Support vector machine demonstrated one order of magnitude improvement in accuracy of assigning the sequences to the correct cluster compared to dummy uniform random classifier. These results confirm that metadata carries important information about the dynamics of HIV transmission as embedded in transmission clusters. Hence, novel computational approaches are needed to apply the non-trivial knowledge collected from inter-individual genetic information to metadata from PWH in order to expand the estimated transmissions. We note that feature extraction alone will not be effective in identifying patterns of transmission and will result in random clustering of the data, but its utilization in conjunction with genetic data and the right algorithm can contribute to the expansion of the reconstructed network beyond individuals with genetic data.

Download Full-text

Predictive model construction for prediction of soil fertility using decision tree machine learning algorithm

Kongunadu Research Journal ◽

10.26524/krj.2021.5 ◽

2021 ◽

Vol 8 (1) ◽

pp. 30-35

Author(s):

Jayalakshmi R ◽

Savitha Devi M

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Soil Fertility ◽

Learning Algorithms ◽

Crop Productivity ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Severe Problem ◽

Agriculture Sector

Agriculture sector is recognized as the backbone of the Indian economy that plays a crucial role in the growth of the nation’s economy. It imparts on weather and other environmental aspects. Some of the factors on which agriculture is reliant are Soil, climate, flooding, fertilizers, temperature, precipitation, crops, insecticides, and herb. The soil fertility is dependent on these factors and hence difficult to predict. However, the Agriculture sector in India is facing the severe problem of increasing crop productivity. Farmers lack the essential knowledge of nutrient content of the soil, selection of crop best suited for the soil and they also lack efficient methods for predicting crop well in advance so that appropriate methods have been used to improve crop productivity. This paper presents different Supervised Machine Learning Algorithms such as Decision tree, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) to predict the fertility of soil based on macro-nutrients and micro-nutrients status found in the dataset. Supervised Machine Learning algorithms are applied on the training dataset and are tested with the test dataset, and the implementation of these algorithms is done using R Tool. The performance analysis of these algorithms is done using different evaluation metrics like mean absolute error, cross-validation, and accuracy. Result analysis shows that the Decision tree is produced the best accuracy of 99% with a very less mean square error (MSE) rate.

Download Full-text

Classification of Agriculture Farm Machinery Using Machine Learning and Internet of Things

Symmetry ◽

10.3390/sym13030403 ◽

2021 ◽

Vol 13 (3) ◽

pp. 403

Author(s):

Muhammad Waleed ◽

Tai-Won Um ◽

Tariq Kamal ◽

Syed Muhammad Usman

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Farm Machinery ◽

Learning Techniques

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.

Download Full-text

Leveraging Machine Learning Algorithms For Zero-Day Ransomware Attack

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8694.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4104-4107

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

Supervised Learning Algorithms ◽

Microsoft Windows

Current global huge cyber protection attacks resulting from Infected Encryption ransomware structures over all international locations and businesses with millions of greenbacks lost in paying compulsion abundance. This type of malware encrypts consumer files, extracts consumer files, and charges higher ransoms to be paid for decryption of keys. An attacker could use different types of ransomware approach to steal a victim's files. Some of ransomware attacks like Scareware, Mobile ransomware, WannaCry, CryptoLocker, Zero-Day ransomware attack etc. A zero-day vulnerability is a software program security flaw this is regarded to the software seller however doesn’t have patch in vicinity to restore a flaw. Despite the fact that machine learning algorithms are already used to find encryption Ransomware. This is based on the analysis of a large number of PE file data Samples (benign software and ransomware utility) makes use of supervised machine learning algorithms for ascertain Zero-day attacks. This work was done on a Microsoft Windows operating system (the most attacked os through encryption ransomware) and estimated it. We have used four Supervised learning Algorithms, Random Forest Classifier , K-Nearest Neighbor, Support Vector Machine and Logistic Regression. Tests using machine learning algorithms evaluate almost null false positives with a 99.5% accuracy with a random forest algorithm.

Download Full-text

A Novel Approach for Detecting DGA-Based Botnets in DNS Queries Using Machine Learning Techniques

Journal of Computer Networks and Communications ◽

10.1155/2021/4767388 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ali Soleymani ◽

Fatemeh Arabgol

Keyword(s):

Machine Learning ◽

Random Forest ◽

Text Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Detection Accuracy ◽

Domain Name ◽

Botnet Detection ◽

Learning Techniques

In today’s security landscape, advanced threats are becoming increasingly difficult to detect as the pattern of attacks expands. Classical approaches that rely heavily on static matching, such as blacklisting or regular expression patterns, may be limited in flexibility or uncertainty in detecting malicious data in system data. This is where machine learning techniques can show their value and provide new insights and higher detection rates. The behavior of botnets that use domain-flux techniques to hide command and control channels was investigated in this research. The machine learning algorithm and text mining used to analyze the network DNS protocol and identify botnets were also described. For this purpose, extracted and labeled domain name datasets containing healthy and infected DGA botnet data were used. Data preprocessing techniques based on a text-mining approach were applied to explore domain name strings with n-gram analysis and PCA. Its performance is improved by extracting statistical features by principal component analysis. The performance of the proposed model has been evaluated using different classifiers of machine learning algorithms such as decision tree, support vector machine, random forest, and logistic regression. Experimental results show that the random forest algorithm can be used effectively in botnet detection and has the best botnet detection accuracy.

Download Full-text

IoT Sensing for Reality-Enhanced Serious Games, a Fuel-Efficient Drive Use Case

Sensors ◽

10.3390/s21103559 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3559

Author(s):

Rana Massoud ◽

Riccardo Berta ◽

Stefan Poslad ◽

Alessandro De Gloria ◽

Francesco Bellotti

Keyword(s):

Machine Learning ◽

Random Forest ◽

Performance Assessment ◽

Serious Games ◽

Reference Model ◽

Personal Data ◽

Real Data ◽

Machine Learning Algorithms ◽

Support Vector ◽

Driver Performance

Internet of Things technologies are spurring new types of instructional games, namely reality-enhanced serious games (RESGs), that support training directly in the field. This paper investigates a key feature of RESGs, i.e., user performance evaluation using real data, and studies an application of RESGs for promoting fuel-efficient driving, using fuel consumption as an indicator of driver performance. In particular, we propose a reference model for supporting a novel smart sensing dataflow involving the combination of two modules, based on machine learning, to be employed in RESGs in parallel and in real-time. The first module concerns quantitative performance assessment, while the second one targets verbal recommendation. For the assessment module, we compared the performance of three well-established machine learning algorithms: support vector regression, random forest and artificial neural networks. The experiments show that random forest achieves a slightly better performance assessment correlation than the others but requires a higher inference time. The instant recommendation module, implemented using fuzzy logic, triggers advice when inefficient driving patterns are detected. The dataflow has been tested with data from the enviroCar public dataset, exploiting on board diagnostic II (OBD II) standard vehicular interface information. The data covers various driving environments and vehicle models, which makes the system robust for real-world conditions. The results show the feasibility and effectiveness of the proposed approach, attaining a high estimation correlation (R2 = 0.99, with random forest) and punctual verbal feedback to the driver. An important word of caution concerns users’ privacy, as the modules rely on sensitive personal data, and provide information that by no means should be misused.

Download Full-text

Coronary Illness Prediction Using Random Forest Classifier

10.3233/apc210285 ◽

2021 ◽

Author(s):

Rekha G ◽

Shanthini B ◽

Ranjith Kumar V

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Heart Diseases ◽

Surrogate Data ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Techniques

Heart diseases or Cardiovascular Diseases (CVDs) are the main cause of death on the planet throughout the most recent years and become the most dangerous disease in India and the entire world. The UCI repository is utilized to calculate the exactness of the AI calculations for foreseeing coronary illness, as k-nearest neighbor, decision tree, linear regression, and support vector machine. Different indications like chest pain, fasting of heartbeat, etc., are referenced. Large datasets, which are not available in medical and clinical research, are required in order to apply deep learning techniques. Surrogate data is generated from Cleveland dataset. The predicted results show that there is an improvement in classification accuracy. Heart disease is one of the most challenging diseases to diagnose as it is the most recognized killer in the present day. Utilizing AI algorithms, this paper gives anticipating coronary illness. Here, we will use the various machine learning algorithms such as Support Vector Machine, Random Forest, KNN, Naive Bayes, Decision Tree and LR.

Download Full-text