scholarly journals Performance Evaluation of Machine Learning Techniques for Identifying Forged and Phony Uniform Resource Locators (URLs)

2019 ◽  
Vol 16 (4) ◽  
pp. 155-169
Author(s):  
N. A. Azeez ◽  
A. A. Ajayi

Since the invention of Information and Communication Technology (ICT), there has been a great shift from the erstwhile traditional approach of handling information across the globe to the usage of this innovation. The application of this initiative cut across almost all areas of human endeavours. ICT is widely utilized in education and production sectors as well as in various financial institutions. It is of note that many people are using it genuinely to carry out their day to day activities while others are using it to perform nefarious activities at the detriment of other cyber users. According to several reports which are discussed in the introductory part of this work, millions of people have become victims of fake Uniform Resource Locators (URLs) sent to their mails by spammers. Financial institutions are not left out in the monumental loss recorded through this illicit act over the years. It is worth mentioning that, despite several approaches currently in place, none could confidently be confirmed to provide the best and reliable solution. According to several research findings reported in the literature, researchers have demonstrated how machine learning algorithms could be employed to verify and confirm compromised and fake URLs in the cyberspace. Inconsistencies have however been noticed in the researchers’ findings and also their corresponding results are not dependable based on the values obtained and conclusions drawn from them. Against this backdrop, the authors carried out a comparative analysis of three learning algorithms (Naïve Bayes, Decision Tree and Logistics Regression Model) for verification of compromised, suspicious and fake URLs and determine which is the best of all based on the metrics (F-Measure, Precision and Recall) used for evaluation. Based on the confusion metrics measurement, the result obtained shows that the Decision Tree (ID3) algorithm achieves the highest values for recall, precision and f-measure. It unarguably provides efficient and credible means of maximizing the detection of compromised and malicious URLs. Finally, for future work, authors are of the opinion that two or more supervised learning algorithms can be hybridized to form a single effective and more efficient algorithm for fake URLs verification.Keywords: Learning-algorithms, Forged-URL, Phoney-URL, performance-comparison

2021 ◽  
Vol 11 (15) ◽  
pp. 6728
Author(s):  
Muhammad Asfand Hafeez ◽  
Muhammad Rashid ◽  
Hassan Tariq ◽  
Zain Ul Abideen ◽  
Saud S. Alotaibi ◽  
...  

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.


Author(s):  
Madhuri Maru ◽  
Saket Swarndeep

Breast cancer represents one of the diseases that make a high number of deaths every year. It is the most common type of all cancers and the main cause of women's deaths worldwide. Classification and data mining methods are an effective way to classify data. Especially in medical field, where those methods are widely used in diagnosis and analysis to make decisions. Here, a common misconception is that predictive analytics and machine learning are the same thing where in predictive analysis is a statistical learning and machine learning is pattern recognition and explores the notion that algorithms can learn from and make predictions on data. In this paper, we are addressing the problem of predictive analysis by adding machine learning techniques for better prediction of breast cancer. In this, a performance comparison between different machine learning algorithms: Support Vector Machine (SVM), Decision Tree (C4.5), Naive Bayes (NB) and k Nearest Neighbors (k-NN) on the Wisconsin Breast Cancer (original) datasets is conducted. The main objective is to assess the correctness in classifying data with respect to efficiency and effectiveness of hybrid algorithm in terms of accuracy, precision, sensitivity and specificity.


Materials ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1089
Author(s):  
Sung-Hee Kim ◽  
Chanyoung Jeong

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 126-127
Author(s):  
Lucas S Lopes ◽  
Christine F Baes ◽  
Dan Tulpan ◽  
Luis Artur Loyola Chardulo ◽  
Otavio Machado Neto ◽  
...  

Abstract The aim of this project is to compare some of the state-of-the-art machine learning algorithms on the classification of steers finished in feedlots based on performance, carcass and meat quality traits. The precise classification of animals allows for fast, real-time decision making in animal food industry, such as culling or retention of herd animals. Beef production presents high variability in its numerous carcass and beef quality traits. Machine learning algorithms and software provide an opportunity to evaluate the interactions between traits to better classify animals. Four different treatment levels of wet distiller’s grain were applied to 97 Angus-Nellore animals and used as features for the classification problem. The C4.5 decision tree, Naïve Bayes (NB), Random Forest (RF) and Multilayer Perceptron (MLP) Artificial Neural Network algorithms were used to predict and classify the animals based on recorded traits measurements, which include initial and final weights, sheer force and meat color. The top performing classifier was the C4.5 decision tree algorithm with a classification accuracy of 96.90%, while the RF, the MLP and NB classifiers had accuracies of 55.67%, 39.17% and 29.89% respectively. We observed that the final decision tree model constructed with C4.5 selected only the dry matter intake (DMI) feature as a differentiator. When DMI was removed, no other feature or combination of features was sufficiently strong to provide good prediction accuracies for any of the classifiers. We plan to investigate in a follow-up study on a significantly larger sample size, the reasons behind DMI being a more relevant parameter than the other measurements.


2018 ◽  
Vol 7 (2.8) ◽  
pp. 684 ◽  
Author(s):  
V V. Ramalingam ◽  
Ayantan Dandapath ◽  
M Karthik Raja

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.


Author(s):  
M. M. Ata ◽  
K. M. Elgamily ◽  
M. A. Mohamed

The presented paper proposes an algorithm for palmprint recognition using seven different machine learning algorithms. First of all, we have proposed a region of interest (ROI) extraction methodology which is a two key points technique. Secondly, we have performed some image enhancement techniques such as edge detection and morphological operations in order to make the ROI image more suitable for the Hough transform. In addition, we have applied the Hough transform in order to extract all the possible principle lines on the ROI images. We have extracted the most salient morphological features of those lines; slope and length. Furthermore, we have applied the invariant moments algorithm in order to produce 7 appropriate hues of interest. Finally, after performing a complete hybrid feature vectors, we have applied different machine learning algorithms in order to recognize palmprints effectively. Recognition accuracy have been tested by calculating precision, sensitivity, specificity, accuracy, dice, Jaccard coefficients, correlation coefficients, and training time. Seven different supervised machine learning algorithms have been implemented and utilized. The effect of forming the proposed hybrid feature vectors between Hough transform and Invariant moment have been utilized and tested. Experimental results show that the feed forward neural network with back propagation has achieved about 99.99% recognition accuracy among all tested machine learning techniques.


Student Performance Management is one of the key pillars of the higher education institutions since it directly impacts the student’s career prospects and college rankings. This paper follows the path of learning analytics and educational data mining by applying machine learning techniques in student data for identifying students who are at the more likely to fail in the university examinations and thus providing needed interventions for improved student performance. The Paper uses data mining approach with 10 fold cross validation to classify students based on predictors which are demographic and social characteristics of the students. This paper compares five popular machine learning algorithms Rep Tree, Jrip, Random Forest, Random Tree, Naive Bayes algorithms based on overall classifier accuracy as well as other class specific indicators i.e. precision, recall, f-measure. Results proved that Rep tree algorithm outperformed other machine learning algorithms in classifying students who are at more likely to fail in the examinations.


Author(s):  
Virendra Tiwari ◽  
Balendra Garg ◽  
Uday Prakash Sharma

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.


Sign in / Sign up

Export Citation Format

Share Document