Detection of Username Enumeration Attack on SSH Protocol: Machine Learning Approach

Over the last two decades (2000–2020), the Internet has rapidly evolved, resulting in symmetrical and asymmetrical Internet consumption patterns and billions of users worldwide. With the immense rise of the Internet, attacks and malicious behaviors pose a huge threat to our computing environment. Brute-force attack is among the most prominent and commonly used attacks, achieved out using password-attack tools, a wordlist dictionary, and a usernames list—obtained through a so-called an enumeration attack. In this paper, we investigate username enumeration attack detection on SSH protocol by using machine-learning classifiers. We apply four asymmetrical classifiers on our generated dataset collected from a closed-environment network to build machine-learning-based models for attack detection. The use of several machine-learners offers a wider investigation spectrum of the classifiers’ ability in attack detection. Additionally, we investigate how beneficial it is to include or exclude network ports information as features-set in the process of learning. We evaluated and compared the performances of machine-learning models for both cases. The models used are k-nearest neighbor (K-NN), naïve Bayes (NB), random forest (RF) and decision tree (DT) with and without ports information. Our results show that machine-learning approaches to detect SSH username enumeration attacks were quite successful, with KNN having an accuracy of 99.93%, NB 95.70%, RF 99.92%, and DT 99.88%. Furthermore, the results improve when using ports information.

Download Full-text

A hybrid machine learning approach of fuzzy-rough-k-nearest neighbor, latent semantic analysis, and ranker search for efficient disease diagnosis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211820 ◽

2021 ◽

pp. 1-16

Author(s):

Sunil Kumar Jha ◽

Ninoslav Marina ◽

Jinwei Wang ◽

Zulfiqar Ahmad

Keyword(s):

Machine Learning ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Nearest Neighbor ◽

Disease Diagnosis ◽

Learning Approach ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Machine Learning Approach ◽

Hybrid Machine

Machine learning approaches have a valuable contribution in improving competency in automated decision systems. Several machine learning approaches have been developed in the past studies in individual disease diagnosis prediction. The present study aims to develop a hybrid machine learning approach for diagnosis predictions of multiple diseases based on the combination of efficient feature generation, selection, and classification methods. Specifically, the combination of latent semantic analysis, ranker search, and fuzzy-rough-k-nearest neighbor has been proposed and validated in the diagnosis prediction of the primary tumor, post-operative, breast cancer, lymphography, audiology, fertility, immunotherapy, and COVID-19, etc. The performance of the proposed approach is compared with single and other hybrid machine learning approaches in terms of accuracy, analysis time, precision, recall, F-measure, the area under ROC, and the Kappa coefficient. The proposed hybrid approach performs better than single and other hybrid approaches in the diagnosis prediction of each of the selected diseases. Precisely, the suggested approach achieved the maximum recognition accuracy of 99.12%of the primary tumor, 96.45%of breast cancer Wisconsin, 94.44%of cryotherapy, 93.81%of audiology, and significant improvement in the classification accuracy and other evaluation metrics in the recognition of the rest of the selected diseases. Besides, it handles the missing values in the dataset effectively.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Assessing the Relation between Mud Components and Rheology for Loss Circulation Prevention Using Polymeric Gels: A Machine Learning Approach

Energies ◽

10.3390/en14051377 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1377

Author(s):

Musaab I. Magzoub ◽

Raj Kiran ◽

Saeed Salehi ◽

Ibnelwaleed A. Hussein ◽

Mustafa S. Nasser

Keyword(s):

Machine Learning ◽

Rheological Properties ◽

Nearest Neighbor ◽

Drilling Fluid ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Wide Range ◽

Machine Learning Approach ◽

Drilling Operations

The traditional way to mitigate loss circulation in drilling operations is to use preventative and curative materials. However, it is difficult to quantify the amount of materials from every possible combination to produce customized rheological properties. In this study, machine learning (ML) is used to develop a framework to identify material composition for loss circulation applications based on the desired rheological characteristics. The relation between the rheological properties and the mud components for polyacrylamide/polyethyleneimine (PAM/PEI)-based mud is assessed experimentally. Four different ML algorithms were implemented to model the rheological data for various mud components at different concentrations and testing conditions. These four algorithms include (a) k-Nearest Neighbor, (b) Random Forest, (c) Gradient Boosting, and (d) AdaBoosting. The Gradient Boosting model showed the highest accuracy (91 and 74% for plastic and apparent viscosity, respectively), which can be further used for hydraulic calculations. Overall, the experimental study presented in this paper, together with the proposed ML-based framework, adds valuable information to the design of PAM/PEI-based mud. The ML models allowed a wide range of rheology assessments for various drilling fluid formulations with a mean accuracy of up to 91%. The case study has shown that with the appropriate combination of materials, reasonable rheological properties could be achieved to prevent loss circulation by managing the equivalent circulating density (ECD).

Download Full-text

High-Speed and Accurate Meat Composition Imaging by Mechanically-Flexible Electrical Impedance Tomography With k-Nearest Neighbor and Fuzzy k-Means Machine Learning Approaches

IEEE Access ◽

10.1109/access.2021.3064315 ◽

2021 ◽

Vol 9 ◽

pp. 38792-38801

Author(s):

P. N. Darma ◽

M. Takei

Keyword(s):

Machine Learning ◽

Electrical Impedance Tomography ◽

High Speed ◽

Electrical Impedance ◽

Nearest Neighbor ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Impedance Tomography ◽

Meat Composition

Download Full-text

The Classification of Skateboarding Tricks : A Transfer Learning and Machine Learning Approach

Mekatronika ◽

10.15282/mekatronika.v2i2.6683 ◽

2020 ◽

Vol 2 (2) ◽

pp. 1-12

Author(s):

Muhammad Nur Aiman Shapiee ◽

Muhammad Ar Rahim Ibrahim ◽

Muhammad Amirul Abdullah ◽

Rabiu Muazu Musa ◽

Noor Azuan Abu Osman ◽

...

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Olympic Games ◽

Learning Approach ◽

K Nearest Neighbor ◽

Test Dataset ◽

Machine Learning Approach ◽

Competitive Games

The skateboarding scene has arrived at new statures, particularly with its first appearance at the now delayed Tokyo Summer Olympic Games. Hence, attributable to the size of the game in such competitive games, progressed creative appraisal approaches have progressively increased due consideration by pertinent partners, particularly with the enthusiasm of a more goal-based assessment. This study purposes for classifying skateboarding tricks, specifically Frontside 180, Kickflip, Ollie, Nollie Front Shove-it, and Pop Shove-it over the integration of image processing, Trasnfer Learning (TL) to feature extraction enhanced with tradisional Machine Learning (ML) classifier. A male skateboarder performed five tricks every sort of trick consistently and the YI Action camera captured the movement by a range of 1.26 m. Then, the image dataset were features built and extricated by means of three TL models, and afterward in this manner arranged to utilize by k-Nearest Neighbor (k-NN) classifier. The perception via the initial experiments showed, the MobileNet, NASNetMobile, and NASNetLarge coupled with optimized k-NN classifiers attain a classification accuracy (CA) of 95%, 92% and 90%, respectively on the test dataset. Besides, the result evident from the robustness evaluation showed the MobileNet+k-NN pipeline is more robust as it could provide a decent average CA than other pipelines. It would be demonstrated that the suggested study could characterize the skateboard tricks sufficiently and could, over the long haul, uphold judges decided for giving progressively objective-based decision.

Download Full-text

PREDICTION OF CORONARY ARTERY DISEASE BASED ON ENSEMBLE LEARNING APPROACHES AND CO-EXPRESSED OBSERVATIONS

Journal of Mechanics in Medicine and Biology ◽

10.1142/s0219519416400108 ◽

2016 ◽

Vol 16 (01) ◽

pp. 1640010 ◽

Cited By ~ 3

Author(s):

YING-TSANG LO ◽

HAMIDO FUJITA ◽

TUN-WEN PAI

Keyword(s):

Machine Learning ◽

Coronary Artery Disease ◽

Coronary Artery ◽

Nearest Neighbor ◽

Prediction Method ◽

Medical Decision ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Artery Disease ◽

Voting Mechanism

Background: Coronary artery disease (CAD) is one of the most representative cardiovascular diseases. Early and accurate prediction of CAD based on physiological measurements can reduce the risk of heart attack through medicine therapy, healthy diet, and regular physical activity. Methods:Four heart disease datasets from the UC Irvine Machine Learning Repository were combined and re-examined to remove incomplete entries, and a total of 822 cases were utilized in this study. Seven machine learning methods, including Naïve Bayes, artificial neural networks (ANNs), sequential minimal optimization (SMO), k-nearest neighbor (KNN), AdaBoost, J48, and random forest, were adopted to analyze the collected datasets for CAD prediction. By combining co-expressed observations and an ensemble voting mechanism, we designed and evaluated a new medical decision classifier for CAD prediction. The TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution) algorithm was applied to determine the best prediction method for CAD diagnosis. Results: Features of systolic blood pressure, cholesterol, heart rate, and ST depression are considered to be the most significant differences between patients with and without CADs. We show that the prediction capability of seven machine learning classifiers can be enhanced by integrating combinations of observed co-expressed features. Finally, compared to the use of any single classifier, the proposed voting mechanism achieved optimal performance according to TOPSIS.

Download Full-text

Machine Learning Based Indoor Localization using Wi-Fi Fingerprinting

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2133.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 502-506

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Indoor Localization ◽

The Internet ◽

Location Prediction ◽

K Nearest Neighbor ◽

Data Types ◽

Specific Location ◽

Web Access

The aim of indoor localization is to locate the objects inside a location wirelessly. This paper reports the models that predict the location along with floor and coordinates from the WAPs (Web Access Points) signal strengths of a user who connects to the internet at a specific location which had three locations. Starting with the cleaning of data, then assigning attributes into proper data types, making subset of dataset for each location, examining each column, and normalizing WAPs rows in order to build models. Different algorithms have been used to predict the location, floor, and coordinates of a logged in user. The models that have been used in this paper are k-Nearest Neighbor (k-NN) for location prediction, random forest for floor prediction and regression with k-NN for coordinate prediction.

Download Full-text

Sentiment Analysis for Iraqis Dialect in Social Media

Iraqi Journal of Information & Communications Technology ◽

10.31987/ijict.1.2.17 ◽

2018 ◽

Vol 1 (2) ◽

pp. 24-32

Author(s):

Lamiaa Abd Habeeb

Keyword(s):

Machine Learning ◽

Social Media ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Learning Approach ◽

Machine Learning Algorithm ◽

K Nearest Neighbor ◽

Multinomial Models ◽

Ensemble Machine Learning ◽

Machine Learning Approach

In this paper, we designed a system that extract citizens opinion about Iraqis government and Iraqis politicians through analyze their comments from Facebook (social media network). Since the data is random and contains noise, we cleaned the text and builds a stemmer to stem the words as much as possible, cleaning and stemming reduced the number of vocabulary from 28968 to 17083, these reductions caused reduction in memory size from 382858 bytes to 197102 bytes. Generally, there are two approaches to extract users opinion; namely, lexicon-based approach and machine learning approach. In our work, machine learning approach is applied with three machine learning algorithm which are; Naïve base, K-Nearest neighbor and AdaBoost ensemble machine learning algorithm. For Naïve base, we apply two models; Bernoulli and Multinomial models. We found that, Naïve base with Multinomial models give highest accuracy.

Download Full-text

Machine Learning Approach to Differentiation of Peripheral Schwannomas and Neurofibromas: A Multi-Center Study

Neuro-Oncology ◽

10.1093/neuonc/noab211 ◽

2021 ◽

Author(s):

Michael Zhang ◽

Elizabeth Tong ◽

Sam Wong ◽

Forrest Hamrick ◽

Maryam Mohammadzadeh ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Clinical Features ◽

Nearest Neighbor ◽

Motor Deficit ◽

Support Vector ◽

Spontaneous Pain ◽

Learning Approaches ◽

Imaging Data ◽

K Nearest Neighbor

Abstract Background Non-invasive differentiation between schwannomas and neurofibromas is important for appropriate management, preoperative counseling, and surgical planning, but has proven difficult using conventional imaging. The objective of this study was to develop and evaluate machine learning approaches for differentiating peripheral schwannomas from neurofibromas. Methods We assembled a cohort of schwannomas and neurofibromas from 3 independent institutions and extracted high-dimensional radiomic features from gadolinium-enhanced, T1-weighted MRI using the PyRadiomics package on Quantitative Imaging Feature Pipeline. Age, sex, neurogenetic syndrome, spontaneous pain, and motor deficit were recorded. We evaluated the performance of 6 radiomics-based classifier models with and without clinical features and compared model performance against human expert evaluators. Results 107 schwannomas and 59 neurofibroma were included. The primary models included both clinical and imaging data. The accuracy of the human evaluators (0.765) did not significantly exceed the no-information rate (NIR), whereas the Support Vector Machine (0.929), Logistic Regression (0.929), and Random Forest (0.905) classifiers exceeded the NIR. Using the method of DeLong, the AUC for the Logistic Regression (AUC=0.923) and K Nearest Neighbor (AUC=0.923) classifiers was significantly greater than the human evaluators (AUC=0.766; p = 0.041). Conclusions The radiomics-based classifiers developed here proved to be more accurate and had a higher AUC on the ROC curve than expert human evaluators. This demonstrates that radiomics using routine MRI sequences and clinical features can aid in differentiation of peripheral schwannomas and neurofibromas.

Download Full-text

Classification Algorithms for Determining Handwritten Digit

Iraqi Journal for Electrical And Electronic Engineering ◽

10.37917/ijeee.12.1.10 ◽

2016 ◽

Vol 12 (1) ◽

pp. 96-102 ◽

Cited By ~ 3

Author(s):

Hayder AL-Behadili

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Paper Analysis ◽

Classification Algorithms ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Data Intensive ◽

Knowledge Based ◽

Computer Scientists ◽

Mathematical Techniques

Data-intensive science is a critical science paradigm that interferes with all other sciences. Data mining (DM) is a powerful and useful technology with wide potential users focusing on important meaningful patterns and discovers a new knowledge from a collected dataset. Any predictive task in DM uses some attribute to classify an unknown class. Classification algorithms are a class of prominent mathematical techniques in DM. Constructing a model is the core aspect of such algorithms. However, their performance highly depends on the algorithm behavior upon manipulating data. Focusing on binarazaition as an approach for preprocessing, this paper analysis and evaluates different classification algorithms when construct a model based on accuracy in the classification task. The Mixed National Institute of Standards and Technology (MNIST) handwritten digits dataset provided by Yann LeCun has been used in evaluation. The paper focuses on machine learning approaches for handwritten digits detection. Machine learning establishes classification methods, such as K-Nearest Neighbor(KNN), Decision Tree (DT), and Neural Networks (NN). Results showed that the knowledge-based method, i.e. NN algorithm, is more accurate in determining the digits as it reduces the error rate. The implication of this evaluation is providing essential insights for computer scientists and practitioners for choosing the suitable DM technique that fit with their data.

Download Full-text