Hybrid Machine Learning: A Tool to Detect Phishing Attacks in Communication Networks

Phishing is a cyber-attack that uses disguised email as a weapon and has been on the rise in recent times. Innocent Internet user if peradventure clicking on a fraudulent link may cause him to fall victim of divulging his personal information such as credit card pin, login credentials, banking information and other sensitive information. There are many ways in which the attackers can trick victims to reveal their personal information. In this article, we select important phishing URLs features that can be used by attacker to trick Internet users into taking the attacker’s desired action. We use two machine learning techniques to accurately classify our data sets. We compare the performance of other related techniques with our scheme. The results of the experiments show that the approach is highly effective in detecting phishing URLs and attained an accuracy of 97.8% with 1.06% false positive rate, 0.5% false negative rate, and an error rate of 0.3%. The proposed scheme performs better compared to other selected related work. This shows that our approach can be used for real-time application in detecting phishing URLs.

Download Full-text

Diagnosis and Classification of the Diabetes Using Machine Learning Algorithms

10.21203/rs.3.rs-514771/v2 ◽

2021 ◽

Author(s):

Prasannavenkatesan Theerthagiri ◽

Usha Ruby A ◽

Vidya J

Keyword(s):

Machine Learning ◽

Multilayer Perceptron ◽

Nearest Neighbor ◽

False Positive Rate ◽

Learning Algorithms ◽

False Negative ◽

False Negative Rate ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

K Nearest Neighbor

Abstract Diabetes mellitus is characterized as a chronic disease may cause many complications. The machine learning algorithms are used to diagnosis and predict the diabetes. The learning based algorithms plays a vital role on supporting decision making in disease diagnosis and prediction. In this paper, traditional classification algorithms and neural network based machine learning are investigated for the diabetes dataset. Also, various performance methods with different aspects are evaluated for the K-nearest neighbor, Naive Bayes, extra trees, decision trees, radial basis function, and multilayer perceptron algorithms. It supports the estimation on patients suffering from diabetes in future. The results of this work shows that the multilayer perceptron algorithm gives the highest prediction accuracy with lowest MSE of 0.19. The MLP gives the lowest false positive rate and false negative rate with highest area under curve of 86 %.

Download Full-text

Detection of Phishing Websites using an Efficient Feature-Based Machine Learning Framework

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c5909.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2857-2862

Keyword(s):

Machine Learning ◽

Personal Information ◽

Machine Learning Algorithms ◽

Sensitive Information ◽

Cyber Attack ◽

Learning Framework ◽

Internet Users ◽

User Data ◽

Feature Based ◽

Classification Prediction

Phishing is a cyber-attack which is socially engineered to trick naive online users into revealing sensitive information such as user data, login credentials, social security number, banking information etc. Attackers fool the Internet users by posing as a legitimate webpage to retrieve personal information. This can also be done by sending emails posing as reputable companies or businesses. Phishing exploits several vulnerabilities effectively and there is no one solution which protects users from all vulnerabilities. A classification/prediction model is designed based on heuristic features that are extracted from website domain, URL, web protocol, source code to eliminate the drawbacks of existing anti-phishing techniques. In the model we combine some existing solutions such as blacklisting and whitelisting, heuristics and visual-based similarity which provides higher level security. We use the model with different Machine Learning Algorithms, namely Logistic Regression, Decision Trees, K-Nearest Neighbours and Random Forests, and compare the results to find the most efficient machine learning framework.

Download Full-text

Tuning the False Positive Rate / False Negative Rate with Phishing Detection Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1002.1291s52019 ◽

2019 ◽

Vol 9 (1S5) ◽

pp. 7-13

Keyword(s):

Machine Learning ◽

Neural Networks ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Trade Off ◽

Detection Model ◽

Phishing Attacks ◽

Positive Rate ◽

Phishing Detection

Phishing attacks have risen by 209% in the last 10 years according to the Anti Phishing Working Group (APWG) statistics [19]. Machine learning is commonly used to detect phishing attacks. Researchers have traditionally judged phishing detection models with either accuracy or F1-scores, however in this paper we argue that a single metric alone will never correlate to a successful deployment of machine learning phishing detection model. This is because every machine learning model will have an inherent trade-off between it’s False Positive Rate (FPR) and False Negative Rate (FNR). Tuning the trade-off is important since a higher or lower FPR/FNR will impact the user acceptance rate of any deployment of a phishing detection model. When models have high FPR, they tend to block users from accessing legitimate webpages, whereas a model with a high FNR will allow the users to inadvertently access phishing webpages. Either one of these extremes may cause a user base to either complain (due to blocked pages) or fall victim to phishing attacks. Depending on the security needs of a deployment (secure vs relaxed setting) phishing detection models should be tuned accordingly. In this paper, we demonstrate two effective techniques to tune the trade-off between FPR and FNR: varying the class distribution of the training data and adjusting the probabilistic prediction threshold. We demonstrate both techniques using a data set of 50,000 phishing and 50,000 legitimate sites to perform all experiments using three common machine learning algorithms for example, Random Forest, Logistic Regression, and Neural Networks. Using our techniques we are able to regulate a model’s FPR/FNR. We observed that among the three algorithms we used, Neural Networks performed best; resulting in an higher F1-score of 0.98 with corresponding FPR/FNR values of 0.0003 and 0.0198 respectively.

Download Full-text

Formjacking attack: Are we safe?

Journal of Financial Crime ◽

10.1108/jfc-07-2020-0138 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Vijaya Geeta Dharmavaram

Keyword(s):

Machine Learning ◽

Credit Card ◽

Defense Mechanism ◽

Machine Learning Techniques ◽

Sensitive Information ◽

Sensitive Data ◽

Modus Operandi ◽

Content Type ◽

Learning Techniques ◽

Cyber Criminals

Purpose Today, online shopping and online business has become a new norm especially in the current pandemic scenario. With more businesses running online, cyber criminals are coming up with different tactics to steal identity and sensitive information such as credit card and banking credentials either for personal monetary gain or to sell in the dark Web. One form of such attack that is seen in the recent times is formjacking attack. This paper aims to review the current scenario of formjacking attack and its modus operandi. The paper also provides certain counter measures that can be adopted by the users and website owners. Design/methodology/approach The paper mainly focuses on the modus operandi of formjacking attack to understand the severity of the problem. Based on the way the attack is carried out, some guidelines to be followed are provided. Later, a brief review of machine learning techniques is furnished to understand how it may help as secure defense mechanism. Findings Formjacking attacks are on a rise in the past two years, especially during the holiday season. Cyber criminals have been using smart tactics to carry out these attacks which are very difficult to detect. Machine learning techniques may prove to be effective in combating these attacks. Originality/value Formjacking attack is not just a concern of the customers who may lose their sensitive data, but the onus also lies on the companies itself to ensure they protect their customer’s data from theft. Not much research is found regarding formjacking attack, as it is relatively a new form of attack. The paper reviews this attack and provides some measure that can be followed. It also provides few guidelines which can be used for further research in devising a security tool to mitigate this problem.

Download Full-text

Advanced Principal Component Analysis for Analysis of Optimized Credit Card Fraud Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1331.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 318-322

Keyword(s):

Principal Component Analysis ◽

Credit Card ◽

Detection System ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Principal Component ◽

Component Analysis ◽

Security Systems ◽

Positive Rate

The information has turned out to be increasingly more imperative to people, associations, and organizations, and thusly, shielding this delicate information in social databases has turned into a basic issue. In any case, in spite of customary security systems, assaults coordinated to databases still happen. In this way, an intrusion detection system (IDS) explicitly for the database that can give security from all conceivable malignant clients is important. In this paper, we present the Principal Component Analysis (PCA) technique with weighted voting in favor of the assignment of inconsistency location. PCA is a diagram based procedure reasonable for demonstrating bunching questions, and weighted casting a ballot improves its capacities by adjusting the casting a ballot effect of each tree. Trials demonstrate that RF with weighted casting a ballot shows a progressively predominant presentation consistency, just as better blunder rates with an expanding number of trees, contrasted with traditional grouping approaches. Besides, it outflanks all other best in class information mining calculations as far as false positive rate and false negative rate.

Download Full-text

Registration Status Prediction of Students using Machine Learning in the Context of Private University of Bangladesh

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5292.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2594-2600 ◽

Cited By ~ 1

Keyword(s):

Higher Education ◽

Machine Learning ◽

Random Forest ◽

Performance Metrics ◽

False Positive Rate ◽

False Negative ◽

Private Universities ◽

False Negative Rate ◽

Private University ◽

Data Set

Bangladesh is a densely populated country where a large portion of citizens is living under poverty. In Bangladesh, a significant portion of higher education is accomplished at private universities. In this twenty-first century, these students of higher education are highly mobile and different from earlier generations. Thus, retaining existing students has become a great challenge for many private universities in Bangladesh. Early prediction of the total number of registered students in a semester can help in this regard. This can have a direct impact on a private university in terms of budget, marketing strategy, and sustainability. In this paper, we have predicted the number of registered students in a semester in the context of a private university by following several machine learning approaches. We have applied seven prominent classifiers, namely SVM, Naive Bayes, Logistic, JRip, J48, Multilayer Perceptron, and Random Forest on a data set of more than a thousand students of a private university in Bangladesh, where each record contains five attributes. First, all data are preprocessed. Then preprocessed data are separated into the training and testing set. Then, all these classifiers are trained and tested. Since a suitable classifier is required to solve the problem, the performances of all seven classifiers need to be thoroughly assessed. So, we have computed six performance metrics, i.e. accuracy, sensitivity, specificity, precision, false positive rate (FPR) and false negative rate (FNR) for each of the seven classifiers and compare them. We have found that SVM outperforms all other classifiers achieving 85.76% accuracy, whereas Random Forest achieved the lowest accuracy which is 79.65%.

Download Full-text

Machine Learning Based Classification Models for Financial Crisis Prediction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8362.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4887-4893

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Financial Crisis ◽

Large Scale ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Small Scale ◽

Early Prediction ◽

Rbf Network

Financial Crisis Prediction (FCP) being the most complicated and expected problem to be solved from the context of corporate organization, small scale to large scale industries, investors, bank organizations and government agencies, it is important to design a framework to determine a methodology that will reveal a solution for early prediction of the Financial Crisis Prediction (FCP). Earlier methods are reviewed through the various works in statistical techniques applied to solve the problem. However, it is not sufficient to predict the results with much more intelligence and automated manner. The major objective of this paper is to enhance the early prediction of Financial Crisis in any organization based on machine learning models like Multilayer Perceptron, Radial basis Function (RBF) Network, Logistic regression and Deep Learning methods and conduct a comparative analysis of them to determine the best methods for Financial Crisis Prediction (FDP). The testing is conducted with globalized benchmark datasets namely German dataset, Weislaw dataset and Polish Dataset. The testing is performed in both WEKA and Rapid Miner Framework design and obtained with accuracies and other performance measures like False Positive Rate (FPR), False Negative Rate (FNR), Precision, Recall, F-score and Kappa that would determine the best result from specific algorithm that will intelligently identify the financial crisis before it actually occurs in an organization. The results achieved the algorithms DL, MLP, LR and RBF Network with accuracies 96%, 72.10%, 75.20% and 74% on German Dataset, 91.25%, 85.83%, 83.75% and 73.75% on Weislaw dataset, 99.70%, 96.30%, 96.21% and 96.14 on Polish dataset respectively. It is evident from all the predictive results and the analytics in Rapid Miner that Deep Learning (DL) is the best classifier and performer among other machine learners and classifiers. This method will enhance the future predictions and would provide efficient solutions for financial crisis predictions.

Download Full-text

Predicting Malignancy with Pediatric Thyroid Nodules: Early Experience in Machine Learning for Clinical Decision Support

The Journal of Clinical Endocrinology & Metabolism ◽

10.1210/clinem/dgab435 ◽

2021 ◽

Author(s):

Lebohang Radebe ◽

Daniëlle C M van der Kaay ◽

Jonathan D Wasserman ◽

Anna Goldenberg

Keyword(s):

Machine Learning ◽

Thyroid Nodules ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Clinical Decision ◽

Clinical Tool ◽

Number Of Patients ◽

Benign Histology ◽

Increase In Accuracy

Abstract Background Papillary thyroid carcinoma is the most common endocrine malignancy. Since most nodules are benign, the challenge for the clinician is to identify those most likely to harbour malignancy while limiting exposure to surgical risks among those with benign nodules. Methods Random Forests (augmented to select features based on our clinical measure of interest), in conjunction with interpretable rule sets, were used on demographic, ultrasound and biopsy data of thyroid nodules from children <18 years at a tertiary pediatric hospital. Accuracy, False Positive Rate (FPR), False Negative Rate (FNR) and Area Under the Receiver Operator Curve (AUROC) are reported. Results Our models predict non-benign cytology and malignant histology better than historical outcomes. Specifically, we expect a 68.04% improvement in the FPR, 11.90% increase in accuracy and 24.85% increase in AUROC for biopsy predictions in 67 patients (28 with benign and 39 with non-benign histology). We expect an 23.22% decrease in FPR, 32.19% increase in accuracy, and 3.84% decrease in AUROC for surgery prediction in 53 patients (42 with benign and 11 with non-benign histology). This improvement comes at the expense of the FNR, where we expect 10.27% with malignancy would be discouraged from performing biopsy, and 11.67% from surgery. Given the small number of patients, these improvements are estimates and are not tested on an independent test set Conclusions This work presents a first attempt at developing an interpretable machine learning based clinical tool to aid clinicians. Future work will involve sourcing more data and developing probabilistic estimates for predictions.

Download Full-text

Top-Down Machine Learning-Based Architecture for Cyberattacks Identification and Classification in IoT Communication Networks

Frontiers in Big Data ◽

10.3389/fdata.2021.782902 ◽

2022 ◽

Vol 4 ◽

Author(s):

Qasem Abu Al-Haija

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Communication Networks ◽

High Performance ◽

Real Life ◽

Sensitive Information ◽

Learning Technology ◽

Cyber Attack ◽

Top Down ◽

Real Time System

With the prompt revolution and emergence of smart, self-reliant, and low-power devices, Internet of Things (IoT) has inconceivably expanded and impacted almost every real-life application. Nowadays, for example, machines and devices are now fully reliant on computer control and, instead, they have their own programmable interfaces, such as cars, unmanned aerial vehicles (UAVs), and medical devices. With this increased use of IoT, attack capabilities have increased in response, which became imperative that new methods for securing these systems be developed to detect attacks launched against IoT devices and gateways. These attacks are usually aimed at accessing, changing, or destroying sensitive information; extorting money from users; or interrupting normal business processes. In this research, we present new efficient and generic top-down architecture for intrusion detection, and classification in IoT networks using non-traditional machine learning is proposed in this article. The proposed architecture can be customized and used for intrusion detection/classification incorporating any IoT cyber-attack datasets, such as CICIDS Dataset, MQTT dataset, and others. Specifically, the proposed system is composed of three subsystems: feature engineering (FE) subsystem, feature learning (FL) subsystem, and detection and classification (DC) subsystem. All subsystems have been thoroughly described and analyzed in this article. Accordingly, the proposed architecture employs deep learning models to enable the detection of slightly mutated attacks of IoT networking with high detection/classification accuracy for the IoT traffic obtained from either real-time system or a pre-collected dataset. Since this work employs the system engineering (SE) techniques, the machine learning technology, the cybersecurity of IoT systems field, and the collective corporation of the three fields have successfully yielded a systematic engineered system that can be implemented with high-performance trajectories.

Download Full-text