Detection of Drive-by Download Attacks Using Machine Learning Approach

2017 ◽  
Vol 11 (4) ◽  
pp. 16-28 ◽  
Author(s):  
Monther Aldwairi ◽  
Musaab Hasan ◽  
Zayed Balbahaith

Drive-by download refers to attacks that automatically download malwares to user's computer without his knowledge or consent. This type of attack is accomplished by exploiting web browsers and plugins vulnerabilities. The damage may include data leakage leading to financial loss. Traditional antivirus and intrusion detection systems are not efficient against such attacks. Researchers proposed plenty of detection approaches mostly passive blacklisting. However, a few proposed dynamic classification techniques, which suffer from clear shortcomings. In this paper, we propose a novel approach to detect drive-by download infected web pages based on extracted features from their source code. We test 23 different machine learning classifiers using data set of 5435 webpages and based on the detection accuracy we selected the top five to build our detection model. The approach is expected to serve as a base for implementing and developing anti drive-by download programs. We develop a graphical user interface program to allow the end user to examine the URL before visiting the website. The Bagged Trees classifier exhibited the highest accuracy of 90.1% and reported 96.24% true positive and 26.07% false positive rate.

2020 ◽  
pp. 1598-1611
Author(s):  
Monther Aldwairi ◽  
Musaab Hasan ◽  
Zayed Balbahaith

Drive-by download refers to attacks that automatically download malwares to user's computer without his knowledge or consent. This type of attack is accomplished by exploiting web browsers and plugins vulnerabilities. The damage may include data leakage leading to financial loss. Traditional antivirus and intrusion detection systems are not efficient against such attacks. Researchers proposed plenty of detection approaches mostly passive blacklisting. However, a few proposed dynamic classification techniques, which suffer from clear shortcomings. In this paper, we propose a novel approach to detect drive-by download infected web pages based on extracted features from their source code. We test 23 different machine learning classifiers using data set of 5435 webpages and based on the detection accuracy we selected the top five to build our detection model. The approach is expected to serve as a base for implementing and developing anti drive-by download programs. We develop a graphical user interface program to allow the end user to examine the URL before visiting the website. The Bagged Trees classifier exhibited the highest accuracy of 90.1% and reported 96.24% true positive and 26.07% false positive rate.


2021 ◽  
pp. bjophthalmol-2020-318188
Author(s):  
Shotaro Asano ◽  
Hiroshi Murata ◽  
Yuri Fujino ◽  
Takehiro Yamashita ◽  
Atsuya Miki ◽  
...  

Background/AimTo investigate the clinical validity of the Guided Progression Analysis definition (GPAD) and cluster-based definition (CBD) with the Humphrey Field Analyzer 10-2 test in diagnosing glaucomatous visual field (VF) progression, and to introduce a novel definition with optimised specificity by combining the ‘any-location’ and ‘cluster-based’ approaches (hybrid definition).Methods64 400 stable glaucomatous VFs were simulated from 664 pairs of 10-2 tests (10 sets × 10 VF series × 664 eyes; data set 1). Using these simulated VFs, the specificity to detect progression and the effects of changing the parameters (number of test locations or consecutive VF tests, and percentile cut-off values) were investigated. The hybrid definition was designed as the combination where the specificity was closest to 95.0%. Subsequently, another 5000 actual glaucomatous 10-2 tests from 500 eyes (10 VFs each) were collected (data set 2), and their accuracy (sensitivity, specificity and false positive rate) and the time needed to detect VF progression were evaluated.ResultsThe specificity values calculated using data set 1 with GPAD and CBD were 99.6% and 99.8%. Using data set 2, the hybrid definition had a higher sensitivity than GPAD and CBD, without detriment to the specificity or false positive rate. The hybrid definition also detected progression significantly earlier than GPAD and CBD (at 3.1 years vs 4.2 years and 4.1 years, respectively).ConclusionsGPAD and CBD had specificities of 99.6% and 99.8%, respectively. A novel hybrid definition (with a specificity of 95.5%) had higher sensitivity and enabled earlier detection of progression.


Author(s):  
Shashidhara Bola

A new method is proposed to classify the lung nodules as benign and malignant. The method is based on analysis of lung nodule shape, contour, and texture for better classification. The data set consists of 39 lung nodules of 39 patients which contain 19 benign and 20 malignant nodules. Lung regions are segmented based on morphological operators and lung nodules are detected based on shape and area features. The proposed algorithm was tested on LIDC (lung image database consortium) datasets and the results were found to be satisfactory. The performance of the method for distinction between benign and malignant was evaluated by the use of receiver operating characteristic (ROC) analysis. The method achieved area under the ROC curve was 0.903 which reduces the false positive rate.


Computers ◽  
2019 ◽  
Vol 8 (2) ◽  
pp. 35 ◽  
Author(s):  
Xuan Dau Hoang ◽  
Ngoc Tuong Nguyen

Defacement attacks have long been considered one of prime threats to websites and web applications of companies, enterprises, and government organizations. Defacement attacks can bring serious consequences to owners of websites, including immediate interruption of website operations and damage of the owner reputation, which may result in huge financial losses. Many solutions have been researched and deployed for monitoring and detection of website defacement attacks, such as those based on checksum comparison, diff comparison, DOM tree analysis, and complicated algorithms. However, some solutions only work on static websites and others demand extensive computing resources. This paper proposes a hybrid defacement detection model based on the combination of the machine learning-based detection and the signature-based detection. The machine learning-based detection first constructs a detection profile using training data of both normal and defaced web pages. Then, it uses the profile to classify monitored web pages into either normal or attacked. The machine learning-based component can effectively detect defacements for both static pages and dynamic pages. On the other hand, the signature-based detection is used to boost the model’s processing performance for common types of defacements. Extensive experiments show that our model produces an overall accuracy of more than 99.26% and a false positive rate of about 0.27%. Moreover, our model is suitable for implementation of a real-time website defacement monitoring system because it does not demand extensive computing resources.


MENDEL ◽  
2019 ◽  
Vol 25 (2) ◽  
pp. 1-10 ◽  
Author(s):  
Ivan Zelinka ◽  
Eslam Amer

Current commercial antivirus detection engines still rely on signature-based methods. However, with the huge increase in the number of new malware, current detection methods become not suitable. In this paper, we introduce a malware detection model based on ensemble learning. The model is trained using the minimum number of signification features that are extracted from the file header. Evaluations show that the ensemble models slightly outperform individual classification models. Experimental evaluations show that our model can predict unseen malware with an accuracy rate of 0.998 and with a false positive rate of 0.002. The paper also includes a comparison between the performance of the proposed model and with different machine learning techniques. We are emphasizing the use of machine learning based approaches to replace conventional signature-based methods.


Phishing attacks have risen by 209% in the last 10 years according to the Anti Phishing Working Group (APWG) statistics [19]. Machine learning is commonly used to detect phishing attacks. Researchers have traditionally judged phishing detection models with either accuracy or F1-scores, however in this paper we argue that a single metric alone will never correlate to a successful deployment of machine learning phishing detection model. This is because every machine learning model will have an inherent trade-off between it’s False Positive Rate (FPR) and False Negative Rate (FNR). Tuning the trade-off is important since a higher or lower FPR/FNR will impact the user acceptance rate of any deployment of a phishing detection model. When models have high FPR, they tend to block users from accessing legitimate webpages, whereas a model with a high FNR will allow the users to inadvertently access phishing webpages. Either one of these extremes may cause a user base to either complain (due to blocked pages) or fall victim to phishing attacks. Depending on the security needs of a deployment (secure vs relaxed setting) phishing detection models should be tuned accordingly. In this paper, we demonstrate two effective techniques to tune the trade-off between FPR and FNR: varying the class distribution of the training data and adjusting the probabilistic prediction threshold. We demonstrate both techniques using a data set of 50,000 phishing and 50,000 legitimate sites to perform all experiments using three common machine learning algorithms for example, Random Forest, Logistic Regression, and Neural Networks. Using our techniques we are able to regulate a model’s FPR/FNR. We observed that among the three algorithms we used, Neural Networks performed best; resulting in an higher F1-score of 0.98 with corresponding FPR/FNR values of 0.0003 and 0.0198 respectively.


2020 ◽  
Vol 493 (2) ◽  
pp. 1842-1854
Author(s):  
Haitao Lin ◽  
Xiangru Li ◽  
Ziying Luo

ABSTRACT It is an active topic to investigate the schemes based on machine learning (ML) methods for detecting pulsars as the data volume growing exponentially in modern surveys. To improve the detection performance, input features into an ML model should be investigated specifically. In the existing pulsar detection researches based on ML methods, there are mainly two kinds of feature designs: the empirical features and statistical features. Due to the combinational effects from multiple features, however, there exist some redundancies and even irrelevant components in the available features, which can reduce the accuracy of a pulsar detection model. Therefore, it is essential to select a subset of relevant features from a set of available candidate features and known as feature selection. In this work, two feature selection algorithms –Grid Search (GS) and Recursive Feature Elimination (RFE) – are proposed to improve the detection performance by removing the redundant and irrelevant features. The algorithms were evaluated on the Southern High Time Resolution University survey (HTRU-S) with five pulsar detection models. The experimental results verify the effectiveness and efficiency of our proposed feature selection algorithms. By the GS, a model with only two features reach a recall rate as high as 99 per cent and a false positive rate (FPR) as low as 0.65 per cent; by the RFE, another model with only three features achieves a recall rate of 99 per cent and an FPR of 0.16 per cent in pulsar candidates classification. Furthermore, this work investigated the number of features required as well as the misclassified pulsars by our models.


2015 ◽  
Vol 9 (3) ◽  
pp. 21-40 ◽  
Author(s):  
Rui Wang ◽  
Zhiyong Zhang ◽  
Lei Ju ◽  
Zhiping Jia

Software-Defined Networking (SDN) and OpenFlow have brought a promising architecture for the future networks. However, there are still a lot of security challenges to SDN. To protect SDN from the Distributed denial-of-service (DDoS) flooding attack, this paper extends the flow entry counters and adds a mark action of OpenFlow, then proposes an entropy-based distributed attack detection model, a novel IP traceback and source filtering response mechanism in SDN with OpenFlow-based Deterministic Packet Marking. It achieves detecting the attack at the destination and filtering the malicious traffic at the source and can be easily implemented in SDN controller program, software or programmable switch, such as Open vSwitch and NetFPGA. The experimental results show that this scheme can detect the attack quickly, achieve a high detection accuracy with a low false positive rate, shield the victim from attack traffic and also avoid the attacker consuming resource and bandwidth on the intermediate links.


Author(s):  
Nur Syuhada Selamat ◽  
Fakariah Hani Mohd Ali

<p>Currently, the volume of malware grows faster each year and poses a thoughtful global security threat. The number of malware developed increases as computers became interconnected, at an alarming rate in the 1990s. This scenario resulted the increment of malware. It also caused many protections are built to fight the malware. Unfortunately, the current technology is no longer effective to handle more advanced malware. Malware authors have created them to become more difficult to be evaded from anti-virus detection. In the current research, Machine Learning (ML) algorithm techniques became more popular to the researchers to analyze malware detection. In this paper, researchers proposed a defense system which uses three ML algorithm techniques comparison and select them based on the high accuracy malware detection. The result indicates that Decision Tree algorithm is the best detection accuracy compares to others classifier with 99% and 0.021% False Positive Rate (FPR) on a relatively small dataset.</p>


2021 ◽  
Vol 7 ◽  
pp. e640
Author(s):  
Saif Al-mashhadi ◽  
Mohammed Anbar ◽  
Iznan Hasbullah ◽  
Taief Alaa Alamiedy

Botnets can simultaneously control millions of Internet-connected devices to launch damaging cyber-attacks that pose significant threats to the Internet. In a botnet, bot-masters communicate with the command and control server using various communication protocols. One of the widely used communication protocols is the ‘Domain Name System’ (DNS) service, an essential Internet service. Bot-masters utilise Domain Generation Algorithms (DGA) and fast-flux techniques to avoid static blacklists and reverse engineering while remaining flexible. However, botnet’s DNS communication generates anomalous DNS traffic throughout the botnet life cycle, and such anomaly is considered an indicator of DNS-based botnets presence in the network. Despite several approaches proposed to detect botnets based on DNS traffic analysis; however, the problem still exists and is challenging due to several reasons, such as not considering significant features and rules that contribute to the detection of DNS-based botnet. Therefore, this paper examines the abnormality of DNS traffic during the botnet lifecycle to extract significant enriched features. These features are further analysed using two machine learning algorithms. The union of the output of two algorithms proposes a novel hybrid rule detection model approach. Two benchmark datasets are used to evaluate the performance of the proposed approach in terms of detection accuracy and false-positive rate. The experimental results show that the proposed approach has a 99.96% accuracy and a 1.6% false-positive rate, outperforming other state-of-the-art DNS-based botnet detection approaches.


Sign in / Sign up

Export Citation Format

Share Document