A hybrid firefly and support vector machine classifier for phishing email detection

Kybernetes ◽  
2016 ◽  
Vol 45 (6) ◽  
pp. 977-994 ◽  
Author(s):  
Oluyinka Aderemi Adewumi ◽  
Ayobami Andronicus Akinyelu

Purpose – Phishing is one of the major challenges faced by the world of e-commerce today. Thanks to phishing attacks, billions of dollars has been lost by many companies and individuals. The global impact of phishing attacks will continue to be on the increase and thus a more efficient phishing detection technique is required. The purpose of this paper is to investigate and report the use of a nature inspired based-machine learning (ML) approach in classification of phishing e-mails. Design/methodology/approach – ML-based techniques have been shown to be efficient in detecting phishing attacks. In this paper, firefly algorithm (FFA) was integrated with support vector machine (SVM) with the primary aim of developing an improved phishing e-mail classifier (known as FFA_SVM), capable of accurately detecting new phishing patterns as they occur. From a data set consisting of 4,000 phishing and ham e-mails, a set of features, suitable for phishing e-mail detection, was extracted and used to construct the hybrid classifier. Findings – The FFA_SVM was applied to a data set consisting of up to 4,000 phishing and ham e-mails. Simulation experiments were performed to evaluate and compared the performance of the classifier. The tests yielded a classification accuracy of 99.94 percent, false positive rate of 0.06 percent and false negative rate of 0.04 percent. Originality/value – The hybrid algorithm has not been earlier apply, as in this work, to the classification and detection of phishing e-mail, to the best of the authors’ knowledge.

Phishing attacks have risen by 209% in the last 10 years according to the Anti Phishing Working Group (APWG) statistics [19]. Machine learning is commonly used to detect phishing attacks. Researchers have traditionally judged phishing detection models with either accuracy or F1-scores, however in this paper we argue that a single metric alone will never correlate to a successful deployment of machine learning phishing detection model. This is because every machine learning model will have an inherent trade-off between it’s False Positive Rate (FPR) and False Negative Rate (FNR). Tuning the trade-off is important since a higher or lower FPR/FNR will impact the user acceptance rate of any deployment of a phishing detection model. When models have high FPR, they tend to block users from accessing legitimate webpages, whereas a model with a high FNR will allow the users to inadvertently access phishing webpages. Either one of these extremes may cause a user base to either complain (due to blocked pages) or fall victim to phishing attacks. Depending on the security needs of a deployment (secure vs relaxed setting) phishing detection models should be tuned accordingly. In this paper, we demonstrate two effective techniques to tune the trade-off between FPR and FNR: varying the class distribution of the training data and adjusting the probabilistic prediction threshold. We demonstrate both techniques using a data set of 50,000 phishing and 50,000 legitimate sites to perform all experiments using three common machine learning algorithms for example, Random Forest, Logistic Regression, and Neural Networks. Using our techniques we are able to regulate a model’s FPR/FNR. We observed that among the three algorithms we used, Neural Networks performed best; resulting in an higher F1-score of 0.98 with corresponding FPR/FNR values of 0.0003 and 0.0198 respectively.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yun Zuo ◽  
Jianyuan Lin ◽  
Xiangxiang Zeng ◽  
Quan Zou ◽  
Xiangrong Liu

Abstract Background Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by reactive oxygen species is involved in the etiology and pathophysiological processes of aging, neurodegenerative diseases, inflammation, diabetes, amyotrophic lateral sclerosis, Huntington’s disease, and tumor. Current experimental approaches used to predict carbonylation sites are expensive, time-consuming, and limited in protein processing abilities. Computational prediction of the carbonylation residue location in protein post-translational modifications enhances the functional characterization of proteins. Results In this study, an integrated classifier algorithm, CarSite-II, was developed to identify K, P, R, and T carbonylated sites. The resampling method K-means similarity-based undersampling and the synthetic minority oversampling technique (SMOTE-KSU) were incorporated to balance the proportions of K, P, R, and T carbonylated training samples. Next, the integrated classifier system Rotation Forest uses “support vector machine” subclassifications to divide three types of feature spaces into several subsets. CarSite-II gained Matthew’s correlation coefficient (MCC) values of 0.2287/0.3125/0.2787/0.2814, False Positive rate values of 0.2628/0.1084/0.1383/0.1313, False Negative rate values of 0.2252/0.0205/0.0976/0.0608 for K/P/R/T carbonylation sites by tenfold cross-validation, respectively. On our independent test dataset, CarSite-II yield MCC values of 0.6358/0.2910/0.4629/0.3685, False Positive rate values of 0.0165/0.0203/0.0188/0.0094, False Negative rate values of 0.1026/0.1875/0.2037/0.3333 for K/P/R/T carbonylation sites. The results show that CarSite-II achieves remarkably better performance than all currently available prediction tools. Conclusion The related results revealed that CarSite-II achieved better performance than the currently available five programs, and revealed the usefulness of the SMOTE-KSU resampling approach and integration algorithm. For the convenience of experimental scientists, the web tool of CarSite-II is available in http://47.100.136.41:8081/


2018 ◽  
Vol 61 (2) ◽  
pp. 469-479 ◽  
Author(s):  
Chao Zhou ◽  
Chuanheng Sun ◽  
Kai Lin ◽  
Daming Xu ◽  
Qiang Guo ◽  
...  

Abstract. In aquaculture, almost all images collected of an aquaculture scene contain reflections, which often affect the results and accuracy of machine vision. Classifying these images and obtaining images of interest are key to subsequent image processing. The purpose of this study was to identify useful images and remove images that had a substantial effect on the results of image processing for computer vision in aquaculture. In this study, a method for classification of reflective frames based on image texture and a support vector machine (SVM) was proposed for an actual aquaculture site. Objectives of this study were to: (1) develop an algorithm to improve the speed of the method and to ensure that the method has a high classification accuracy, (2) design an algorithm to improve the intelligence and adaptability of the classification, and (3) demonstrate the performance of the method. The results show that the average classification accuracy, false positive rate, and false negative rate for two types of reflective frames (type I and II) were 96.34%, 4.65%, and 2.23%, respectively. In addition, the running time was very low (1.25 s). This strategy also displayed considerable adaptability and could be used to obtain useful images or remove images that have substantial effects on the accuracy of image processing results, thereby improving the applicability of computer vision in aquaculture. Keywords: Aquaculture, Genetic algorithm, Gray level-gradient co-occurrence matrix, Principal component analysis, Reflection frame, Support vector machine.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Veronika Kurilová ◽  
Jozef Goga ◽  
Miloš Oravec ◽  
Jarmila Pavlovičová ◽  
Slavomír Kajan

AbstractHard exudates are one of the main clinical findings in the retinal images of patients with diabetic retinopathy. Detecting them early significantly impacts the treatment of underlying diseases; therefore, there is a need for automated systems with high reliability. We propose a novel method for identifying and localising hard exudates in retinal images. To achieve fast image pre-scanning, a support vector machine (SVM) classifier was combined with a faster region-based convolutional neural network (faster R-CNN) object detector for the localisation of exudates. Rapid pre-scanning filtered out exudate-free samples using a feature vector extracted from the pre-trained ResNet-50 network. Subsequently, the remaining samples were processed using a faster R-CNN detector for detailed analysis. When evaluating all the exudates as individual objects, the SVM classifier reduced the false positive rate by 29.7% and marginally increased the false negative rate by 16.2%. When evaluating all the images, we recorded a 50% reduction in the false positive rate, without any decrease in the number of false negatives. The interim results suggested that pre-scanning the samples using the SVM prior to implementing the deep-network object detector could simultaneously improve and speed up the current hard exudates detection method, especially when there is paucity of training data.


2015 ◽  
Vol 15 (1) ◽  
pp. 329-351 ◽  
Author(s):  
David E. Kalist ◽  
Daniel Y. Lee ◽  
Stephen J. Spurr

Abstract This study uses a large data set to analyze and predict recidivism of juvenile offenders in Pennsylvania. We employ a split-population duration model to determine the effect of covariates on (1) the probability of failure, defined as a second referral to juvenile court, and (2) the time to failure, given that it occurs. A test of the predictive power of our estimates finds a false positive rate of 18.5% and a false negative rate of 20.7%, which compares favorably to the performance of other models in the literature.


PEDIATRICS ◽  
1981 ◽  
Vol 68 (1) ◽  
pp. 144-145
Author(s):  
Lachlan Ch De Crespigny ◽  
Hugh P. Robinson

We read with interest the report which suggested that the diagnosis of cerebroventricular hemorrhage ([CVH] including both subependymal [SEH] and intraventricular) with real time ultrasound was unreliable.1 Ultrasound, when compared with computed tomography scans, had a 35% false-positive rate and a 21% false-negative rate. In our institution over a 12-month period more than 200 premature babies have been examined (ADR real time linear array scanner with a 7-MHz transducer).


1989 ◽  
Vol 75 (2) ◽  
pp. 156-162 ◽  
Author(s):  
Sandro Sulfaro ◽  
Francesco Querin ◽  
Luigi Barzan ◽  
Mario Lutman ◽  
Roberto Comoretto ◽  
...  

Sixty-six whole-organ sectioned laryngopharyngectomy specimens removed for cancer during a seven-year period were uniformly examined to determine the accuracy of preoperative high resolution computerized tomography (CT) for detection of cartilaginous involvement. Our results indicate that CT has a high overall specificity (88.2%) but a low sensitivity (47.1 %); we observed a high false-negative rate (26.5%) and a fairly low false-positive rate (5.9%). Massive cartilage destruction was easily assessed by CT, whereas both small macroscopic and microscopic neoplastic foci of cartilaginous invasion were missed on CT scans. Moreover, false-positive cases were mainly due to proximity of the tumor to the cartilage. Clinical implications of these results are discussed.


Biomolecules ◽  
2019 ◽  
Vol 9 (12) ◽  
pp. 809
Author(s):  
Miguel Carrasco ◽  
Patricio Toledo ◽  
Nicole D. Tischler

Segmentation is one of the most important stages in the 3D reconstruction of macromolecule structures in cryo-electron microscopy. Due to the variability of macromolecules and the low signal-to-noise ratio of the structures present, there is no generally satisfactory solution to this process. This work proposes a new unsupervised particle picking and segmentation algorithm based on the composition of two well-known image filters: Anisotropic (Perona–Malik) diffusion and non-negative matrix factorization. This study focused on keyhole limpet hemocyanin (KLH) macromolecules which offer both a top view and a side view. Our proposal was able to detect both types of views and separate them automatically. In our experiments, we used 30 images from the KLH dataset of 680 positive classified regions. The true positive rate was 95.1% for top views and 77.8% for side views. The false negative rate was 14.3%. Although the false positive rate was high at 21.8%, it can be lowered with a supervised classification technique.


Symmetry ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 380 ◽  
Author(s):  
Kai Ye

When identifying the key features of the network intrusion signal based on the GA-RBF algorithm (using the genetic algorithm to optimize the radial basis) to identify the key features of the network intrusion signal, the pre-processing process of the network intrusion signal data is neglected, resulting in an increase in network signal data noise, reducing the accuracy of key feature recognition. Therefore, a key feature recognition algorithm for network intrusion signals based on neural network and support vector machine is proposed. The principal component neural network (PCNN) is used to extract the characteristics of the network intrusion signal and the support vector machine multi-classifier is constructed. The feature extraction result is input into the support vector machine classifier. Combined with PCNN and SVM (Support Vector Machine) algorithms, the key features of network intrusion signals are identified. The experimental results show that the algorithm has the advantages of high precision, low false positive rate and the recognition time of key features of R2L (it is a common way of network intrusion attack) data set is only 3.18 ms.


1974 ◽  
Vol 39 (1) ◽  
pp. 95-100 ◽  
Author(s):  
Allan Gerson

To assess the validity and reliability of the Hooper Visual Organization Test, 68 Ss, of whom 16 were clinically and psychometrically determined to be suffering from organic brain damage, 19 had functional disorders, and 33 were without organic or functional disorders (normal), were given the test. The instrument was shown to be reliable ( r = .80), however, clear-cut discriminations between groups were not achieved. There were significant differences in scores of normal and damaged groups, functional and damaged Ss, but not functional and normal Ss. The qualitative signs said to aid in differentiations were totally absent from all protocols. Performance was affected in part by IQ and other aspects of recognition of meaning. There was a 19% false negative rate for the functionals and a 51% false positive rate for normals. The conclusion was that this device is of dubious clinical value.


Sign in / Sign up

Export Citation Format

Share Document