scholarly journals Towards a Multi-Layered Phishing Detection

Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4540
Author(s):  
Kieran Rendall ◽  
Antonia Nisioti ◽  
Alexios Mylonas

Phishing is one of the most common threats that users face while browsing the web. In the current threat landscape, a targeted phishing attack (i.e., spear phishing) often constitutes the first action of a threat actor during an intrusion campaign. To tackle this threat, many data-driven approaches have been proposed, which mostly rely on the use of supervised machine learning under a single-layer approach. However, such approaches are resource-demanding and, thus, their deployment in production environments is infeasible. Moreover, most previous works utilise a feature set that can be easily tampered with by adversaries. In this paper, we investigate the use of a multi-layered detection framework in which a potential phishing domain is classified multiple times by models using different feature sets. In our work, an additional classification takes place only when the initial one scores below a predefined confidence level, which is set by the system owner. We demonstrate our approach by implementing a two-layered detection system, which uses supervised machine learning to identify phishing attacks. We evaluate our system with a dataset consisting of active phishing attacks and find that its performance is comparable to the state of the art.

2019 ◽  
Vol 8 (3) ◽  
pp. 5626-5629

Attacks are many types to disturb the network or any other websites. Phishing attacks (PA) are a type of attacks which attack the website and damage the website and may lose the data. Many types of research have been done to prevent the attacks. To overcome this, in this paper, the integrated phishing attack detection system which is adopted with SVM classifier is implemented to detect phishing websites. Phishing is the cyber attack that will destroy the website and may attack with the virus. There are two parameters that can detect the final phishing detection rate such as Identity, and security. Phishing attacks also occur in various banking and e-commerce websites. This paper deals with the UCL machine learning phishing dataset which consists of 32 attributes. The proposed algorithm implements on this dataset and shows the performance.


2020 ◽  
pp. 1-21 ◽  
Author(s):  
Clément Dalloux ◽  
Vincent Claveau ◽  
Natalia Grabar ◽  
Lucas Emanuel Silva Oliveira ◽  
Claudia Maria Cabral Moro ◽  
...  

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.


2021 ◽  
Vol 35 (1) ◽  
pp. 11-21
Author(s):  
Himani Tyagi ◽  
Rajendra Kumar

IoT is characterized by communication between things (devices) that constantly share data, analyze, and make decisions while connected to the internet. This interconnected architecture is attracting cyber criminals to expose the IoT system to failure. Therefore, it becomes imperative to develop a system that can accurately and automatically detect anomalies and attacks occurring in IoT networks. Therefore, in this paper, an Intrsuion Detection System (IDS) based on extracted novel feature set synthesizing BoT-IoT dataset is developed that can swiftly, accurately and automatically differentiate benign and malicious traffic. Instead of using available feature reduction techniques like PCA that can change the core meaning of variables, a unique feature set consisting of only seven lightweight features is developed that is also IoT specific and attack traffic independent. Also, the results shown in the study demonstrates the effectiveness of fabricated seven features in detecting four wide variety of attacks namely DDoS, DoS, Reconnaissance, and Information Theft. Furthermore, this study also proves the applicability and efficiency of supervised machine learning algorithms (KNN, LR, SVM, MLP, DT, RF) in IoT security. The performance of the proposed system is validated using performance Metrics like accuracy, precision, recall, F-Score and ROC. Though the accuracy of Decision Tree (99.9%) and Randon Forest (99.9%) Classifiers are same but other metrics like training and testing time shows Random Forest comparatively better.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Fisnik Dalipi ◽  
Sule Yildirim Yayilgan ◽  
Alemayehu Gebremedhin

We present our data-driven supervised machine-learning (ML) model to predict heat load for buildings in a district heating system (DHS). Even though ML has been used as an approach to heat load prediction in literature, it is hard to select an approach that will qualify as a solution for our case as existing solutions are quite problem specific. For that reason, we compared and evaluated three ML algorithms within a framework on operational data from a DH system in order to generate the required prediction model. The algorithms examined are Support Vector Regression (SVR), Partial Least Square (PLS), and random forest (RF). We use the data collected from buildings at several locations for a period of 29 weeks. Concerning the accuracy of predicting the heat load, we evaluate the performance of the proposed algorithms using mean absolute error (MAE), mean absolute percentage error (MAPE), and correlation coefficient. In order to determine which algorithm had the best accuracy, we conducted performance comparison among these ML algorithms. The comparison of the algorithms indicates that, for DH heat load prediction, SVR method presented in this paper is the most efficient one out of the three also compared to other methods found in the literature.


2021 ◽  
Vol 42 (12) ◽  
pp. 124101
Author(s):  
Thomas Hirtz ◽  
Steyn Huurman ◽  
He Tian ◽  
Yi Yang ◽  
Tian-Ling Ren

Abstract In a world where data is increasingly important for making breakthroughs, microelectronics is a field where data is sparse and hard to acquire. Only a few entities have the infrastructure that is required to automate the fabrication and testing of semiconductor devices. This infrastructure is crucial for generating sufficient data for the use of new information technologies. This situation generates a cleavage between most of the researchers and the industry. To address this issue, this paper will introduce a widely applicable approach for creating custom datasets using simulation tools and parallel computing. The multi-I–V curves that we obtained were processed simultaneously using convolutional neural networks, which gave us the ability to predict a full set of device characteristics with a single inference. We prove the potential of this approach through two concrete examples of useful deep learning models that were trained using the generated data. We believe that this work can act as a bridge between the state-of-the-art of data-driven methods and more classical semiconductor research, such as device engineering, yield engineering or process monitoring. Moreover, this research gives the opportunity to anybody to start experimenting with deep neural networks and machine learning in the field of microelectronics, without the need for expensive experimentation infrastructure.


2020 ◽  
Vol 50 (1) ◽  
pp. 1-25 ◽  
Author(s):  
Changwon Suh ◽  
Clyde Fare ◽  
James A. Warren ◽  
Edward O. Pyzer-Knapp

Machine learning, applied to chemical and materials data, is transforming the field of materials discovery and design, yet significant work is still required to fully take advantage of machine learning algorithms, tools, and methods. Here, we review the accomplishments to date of the community and assess the maturity of state-of-the-art, data-intensive research activities that combine perspectives from materials science and chemistry. We focus on three major themes—learning to see, learning to estimate, and learning to search materials—to show how advanced computational learning technologies are rapidly and successfully used to solve materials and chemistry problems. Additionally, we discuss a clear path toward a future where data-driven approaches to materials discovery and design are standard practice.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1777
Author(s):  
Muhammad Ali ◽  
Stavros Shiaeles ◽  
Gueltoum Bendiab ◽  
Bogdan Ghita

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.


2020 ◽  
Vol 12 (1) ◽  
pp. 8
Author(s):  
Brandon Hansen ◽  
Cody Coleman ◽  
Yi Zhang ◽  
Maria Seale

The manner in which a prognostics problem is framed is critical for enabling its solution by the proper method. Recently, data-driven prognostics techniques have demonstrated enormous potential when used alone, or as part of a hybrid solution in conjunction with physics-based models. Historical maintenance data constitutes a critical element for the use of a data-driven approach to prognostics, such as supervised machine learning. The historical data is used to create training and testing data sets to develop the machine learning model. Categorical classes for prediction are required for machine learning methods; however, faults of interest in US Army Ground Vehicle Maintenance Records appear as natural language text descriptions rather than a finite set of discrete labels. Transforming linguistically complex data into a set of prognostics classes is necessary for utilizing supervised machine learning approaches for prognostics. Manually labeling fault description instances is effective, but extremely time-consuming; thus, an automated approach to labelling is preferred. The approach described in this paper examines key aspects of the fault text relevant to enabling automatic labeling. A method was developed based on the hypothesis that a given fault description could be generalized into a category. This method uses various natural language processing (NLP) techniques and a priori knowledge of ground vehicle faults to assign classes to the maintenance fault descriptions. The core component of the method used in this paper is a Word2Vec word-embedding model. Word embeddings are used in conjunction with a token-oriented rule-based data structure for document classification. This methodology tags text with user-provided classes using a corpus of similar text fields as its training set. With classes of faults reliably assigned to a given description, supervised machine learning with these classes can be applied using related maintenance information that preceded the fault. This method was developed for labeling US Army Ground Vehicle Maintenance Records, but is general enough to be applied to any natural language data sets accompanied with a priori knowledge of its contents for consistent labeling. In addition to applications in machine learning, generated labels are also conducive to general summarization and case-by-case analysis of faults. The maintenance components of interest for this current application are alternators and gaskets, with future development directed towards determining the RUL of these components based on the labeled data.


Sign in / Sign up

Export Citation Format

Share Document