Intelligent phishing detection scheme using deep learning algorithms

PurposePhishing attacks have evolved in recent years due to high-tech-enabled economic growth worldwide. The rise in all types of fraud loss in 2019 has been attributed to the increase in deception scams and impersonation, as well as to sophisticated online attacks such as phishing. The global impact of phishing attacks will continue to intensify, and thus, a more efficient phishing detection method is required to protect online user activities. To address this need, this study focussed on the design and development of a deep learning-based phishing detection solution that leveraged the universal resource locator and website content such as images, text and frames.Design/methodology/approachDeep learning techniques are efficient for natural language and image classification. In this study, the convolutional neural network (CNN) and the long short-term memory (LSTM) algorithm were used to build a hybrid classification model named the intelligent phishing detection system (IPDS). To build the proposed model, the CNN and LSTM classifier were trained by using 1m universal resource locators and over 10,000 images. Then, the sensitivity of the proposed model was determined by considering various factors such as the type of feature, number of misclassifications and split issues.FindingsAn extensive experimental analysis was conducted to evaluate and compare the effectiveness of the IPDS in detecting phishing web pages and phishing attacks when applied to large data sets. The results showed that the model achieved an accuracy rate of 93.28% and an average detection time of 25 s.Originality/valueThe hybrid approach using deep learning algorithm of both the CNN and LSTM methods was used in this research work. On the one hand, the combination of both CNN and LSTM was used to resolve the problem of a large data set and higher classifier prediction performance. Hence, combining the two methods leads to a better result with less training time for LSTM and CNN architecture, while using the image, frame and text features as a hybrid for our model detection. The hybrid features and IPDS classifier for phishing detection were the novelty of this study to the best of the authors' knowledge.

Download Full-text

Classification of electrocardiogram signal using an ensemble of deep learning models

Data Technologies and Applications ◽

10.1108/dta-05-2020-0108 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Saroj Kumar Pandey ◽

Rekh Ram Janghel

Keyword(s):

Deep Learning ◽

Majority Voting ◽

Classification Model ◽

World Health ◽

Product Rule ◽

Learning Models ◽

Data Set ◽

Raw Data ◽

Content Type ◽

Heartbeat Classification

PurposeAccording to the World Health Organization, arrhythmia is one of the primary causes of deaths across the globe. In order to reduce mortality rate, cardiovascular disease should be properly identified and the proper treatment for the same should be immediately provided to the patients. The objective of this paper was to implement a better heartbeat classification model which will work better than the other implemented heartbeat classification methods.Design/methodology/approachIn this paper, the ensemble of two deep learning models is proposed to classify the MIT-BIH arrhythmia database into four different classes according to ANSI-AAMI standards. First, a convolutional neural network (CNN) model is used to classify heartbeats on a raw data set. Secondly, four features (wavelets, R-R intervals, morphological and higher-order statistics) are extracted from the data set and then applied to a long short-term memory (LSTM) model to classify the heartbeats. Finally, the ensemble of CNN and LSTM model with sum rule, product rule and majority voting has been used to identify the heartbeat classes.FindingsAmong these, the highest accuracy obtained is 98.58% using ensemble method with product rule. The results show that the ensemble of CNN and BLSTM has offered satisfactory performance compared to other techniques discussed in this study.Originality/valueIn this study, we have developed a new combination of two deep learning models to enhance the performance of arrhythmia classification using segmentation of input ECG signals. The contributions of this study are as follows: First, a deep CNN model is built to classify ECG heartbeat using a raw data set. Second, four types of features (R-R interval, HOS, morphological and wavelet) were extracted from the raw data set and then applied to the bidirectional LSTM model to classify the ECG heartbeat. Third, combination rules (sum rules, product rules and majority voting rules) were tested to ensure the accumulated probabilities of the CNN and LSTM models.

Download Full-text

Simulation of crowd management using deep learning algorithm

International Journal of Web Information Systems ◽

10.1108/ijwis-04-2021-0045 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Ibtehal Talal Nafea

Keyword(s):

Deep Learning ◽

Image Classification ◽

Learning Algorithm ◽

Real Life ◽

Traffic Light ◽

Data Set ◽

Content Type ◽

Crowd Management ◽

Deep Learning Algorithm ◽

Proposed Model

Purpose This study aims to propose a new simulation approach for a real-life large and complex crowd management which takes into account deep learning algorithm. Moreover, the proposed model also determines the crowd level and also sends an alarm to avoid the crowd from exceeding its limit. Also, the model estimates crowd density in the pictures through which the study evaluates the deep learning algorithm approach to address the problem of crowd congestion. Furthermore, the suggested model comprises of two main components. The first takes the images of the moving crowd and classifies them into five categories such as “heavily crowded, crowded, semi-crowded, light crowded and normal,” whereas the second one comprises of colour warnings (five). The colour of these lights depends upon the results of the process of classification. The paper is structured as follows. Section 2 describes the theoretical background; Section 3 suggests the proposed approach followed by convolutional neural network (CNN) algorithm in Section 4. Sections 5 and 6 explain the data set and parameters as well as modelling network. Experiment, results and simulation evaluation are explained in Sections 7 and 8. Finally, this paper ends with conclusion which is Section 9 of this paper. Design/methodology/approach This paper addresses the issue of large-scale crowd management by exploiting the techniques and algorithms of simulation and deep learning. It focuses on a real-life case study of Hajj pilgrimage in Saudi Arabia that exhibits intricate pattern of crowd management. Hajj pilgrimage includes performing Umrah along with hajj that involves several steps which is a sacred prayer of Muslims performed at different time span of the year. Muslims from all over the world visit the holy city of Mecca to perform Tawaf that is one of the stages included in the performance of Hajj or Umrah, it is an obligatory step in prayer. Accordingly, all pilgrims require visiting Mataf to perform Tawaf. It is essential to control the crowd performing Tawaf systematically in a constrained place to avoid any mishap. This study proposed a model for crowd management system by using image classification and a system of alarm to manage millions of people during Hajj. This proposed system highly depends on the adequate data set used to train CNN which is a deep learning technique and has recently drawn the attention of the research community as well as the industry in changing applications of image classification and the recognition of speed. The purpose is to train the model with mapped image data, making it available to be used in classifying the crowd into five categories like crowded, heavily crowded, semi-crowded, normal and light-crowded. The results produce adequate signals as they prove to be helpful in terms of monitoring the pilgrims which shows its usefulness. Findings After the first attempt of adding the first convolutional layer with 32 filters, the accuracy is not good and stands out at about 55%. Therefore, the algorithm is further improved by adding the second layer with 64 filters. This attempt is a success as it gives more improved results with an accuracy of 97%. After using the dropout fraction as a 0.5 to prevent overfitting, the test and training accuracy of 98% is achieved which is acceptable training and testing accuracy. Originality/value This study has proposed a model to solve the problem related to estimation of the level of congestion to avoid any accidents from happening because of it. This can be applied to the monitoring schemes that are used during Hajj, especially in crowd management during Tawaf. The model works as such that it activates an alarm when the default crowd limit exceeds. In this way, chances of the crowd reaching a dangerous level are reduced which minimizes the potential accidents that might take place. The model has a traffic light system, the appearance of red light means that the number of pilgrims in a particular area has exceeded its default limit and then it alerts to stop the migration of people to that particular area. The yellow light indicates that the number of pilgrims entering and leaving a particular area has equalized, then the pilgrims are suggested to slower their pace. Finally, the green light shows that the level of the crowd in a particular area is low and that the pilgrims can move freely in that area. The proposed model is simple and user friendly as it uses the most common traffic light system which makes it easier for the pilgrims to understand and follow accordingly.

Download Full-text

A New Text Classification Model Based on Contrastive Word Embedding for Detecting Cybersecurity Intelligence in Twitter

Electronics ◽

10.3390/electronics9091527 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1527 ◽

Cited By ~ 1

Author(s):

Han-Sub Shin ◽

Hyuk-Yoon Kwon ◽

Seung-Jin Ryu

Keyword(s):

Deep Learning ◽

Text Classification ◽

Area Under The Curve ◽

Word Embedding ◽

Classification Model ◽

Data Set ◽

Feature Vectors ◽

Model Based ◽

Proposed Model ◽

The Difference

Detecting cybersecurity intelligence (CSI) on social media such as Twitter is crucial because it allows security experts to respond cyber threats in advance. In this paper, we devise a new text classification model based on deep learning to classify CSI-positive and -negative tweets from a collection of tweets. For this, we propose a novel word embedding model, called contrastive word embedding, that enables to maximize the difference between base embedding models. First, we define CSI-positive and -negative corpora, which are used for constructing embedding models. Here, to supplement the imbalance of tweet data sets, we additionally employ the background knowledge for each tweet corpus: (1) CVE data set for CSI-positive corpus and (2) Wikitext data set for CSI-negative corpus. Second, we adopt the deep learning models such as CNN or LSTM to extract adequate feature vectors from the embedding models and integrate the feature vectors into one classifier. To validate the effectiveness of the proposed model, we compare our method with two baseline classification models: (1) a model based on a single embedding model constructed with CSI-positive corpus only and (2) another model with CSI-negative corpus only. As a result, we indicate that the proposed model shows high accuracy, i.e., 0.934 of F1-score and 0.935 of area under the curve (AUC), which improves the baseline models by 1.76∼6.74% of F1-score and by 1.64∼6.98% of AUC.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

User password repetitive patterns analysis and visualization

Information and Computer Security ◽

10.1108/ics-06-2015-0026 ◽

2016 ◽

Vol 24 (1) ◽

pp. 93-115 ◽

Cited By ~ 2

Author(s):

Xiaoying Yu ◽

Qi Liao

Keyword(s):

Security Policy ◽

Large Data ◽

Efficient Algorithms ◽

Privacy And Security ◽

Data Set ◽

Web Based ◽

Content Type ◽

Word Cloud ◽

Individual Privacy ◽

Password Security

Purpose – Passwords have been designed to protect individual privacy and security and widely used in almost every area of our life. The strength of passwords is therefore critical to the security of our systems. However, due to the explosion of user accounts and increasing complexity of password rules, users are struggling to find ways to make up sufficiently secure yet easy-to-remember passwords. This paper aims to investigate whether there are repetitive patterns when users choose passwords and how such behaviors may affect us to rethink password security policy. Design/methodology/approach – The authors develop a model to formalize the password repetitive problem and design efficient algorithms to analyze the repeat patterns. To help security practitioners to analyze patterns, the authors design and implement a lightweight, Web-based visualization tool for interactive exploration of password data. Findings – Through case studies on a real-world leaked password data set, the authors demonstrate how the tool can be used to identify various interesting patterns, e.g. shorter substrings of the same type used to make up longer strings, which are then repeated to make up the final passwords, suggesting that the length requirement of password policy does not necessarily increase security. Originality/value – The contributions of this study are two-fold. First, the authors formalize the problem of password repetitive patterns by considering both short and long substrings and in both directions, which have not yet been considered in past. Efficient algorithms are developed and implemented that can analyze various repeat patterns quickly even in large data set. Second, the authors design and implement four novel visualization views that are particularly useful for exploration of password repeat patterns, i.e. the character frequency charts view, the short repeat heatmap view, the long repeat parallel coordinates view and the repeat word cloud view.

Download Full-text

A GM(1,N)-based economic cybernetics model for the high-tech industries in China

Kybernetes ◽

10.1108/k-10-2013-0227 ◽

2014 ◽

Vol 43 (5) ◽

pp. 672-685 ◽

Cited By ~ 22

Author(s):

Zheng-Xin Wang

Keyword(s):

Differential Equation ◽

Patent Application ◽

Small Sample ◽

Total Output ◽

High Tech ◽

Data Set ◽

Content Type ◽

Fixed Assets ◽

Grey Differential Equation ◽

Medical Equipments

Purpose – The purpose of this paper is to propose an economic cybernetics model based on the grey differential equation GM(1,N) for China's high-tech industries and provide the necessary support to assist high-tech industries management departments with their policy making. Design/methodology/approach – Based on the principle of grey differential equation GM(1,N), the grey differential equations of five high-tech industries in China are established using the net fixed assets, labor quantity and patent application quantity as cybernetics variables. After the discretization and first-order subtraction reduction to the simultaneous equation of the five grey models, a linear cybernetics model is resulted in. The structure parameters in the cybernetics system show explicit economic significance and can be identified through least square principle. At last, the actual data in 2004-2010 are introduced to empirically analyze the high-tech industrial system in China. Findings – The cybernetics system for China's high-tech industries are stable, observable, and controllable. On the whole, China's high-tech industries show higher output coefficients of the patent application quantity than those of net fixed assets and labor quantity. This suggests that China's industry development mainly depends on technological innovation rather than capital or labor inputs. It is expected that the total output value of China's high-tech industries will grow at an average annual rate of 15 percent in 2011-2015, with contributions of pharmaceuticals, aircraft and spacecraft, electronic and telecommunication equipments, computers and office equipments, medical equipments and meters by 21, 16, 13, 10, and 28 percent, respectively. In addition, pharmaceuticals, as well as medical equipments and meters, present upward proportions in the gross of Chinese high-tech industries significantly. Electronic and telecommunication equipments, plus computers and office equipments exhibit an obvious decreasing proportion. The proportion of the output value of aircraft and spacecraft is basically stable. Practical implications – Empirical analysis results are helpful for related management departments to formulate reasonable industrial policies to keep the sustained and stable development of the high-tech industries in China. Originality/value – Based on the grey differential equation GM(1,N), this research puts forward an economic cybernetics model for the high-tech industries in China. This model is applicable to the economic system with small sample data set.

Download Full-text

An Empirical Bargaining Model with Left-Digit Bias: A Study on Auto Loan Monthly Payments

Management Science ◽

10.1287/mnsc.2020.3923 ◽

2021 ◽

Author(s):

Zhenling Jiang

Keyword(s):

Interest Rate ◽

Interest Rates ◽

Large Data ◽

The United States ◽

Nash Bargaining ◽

Bargaining Model ◽

Data Set ◽

Proposed Model ◽

Lower Interest Rate ◽

The Empirical Analysis

This paper studies price bargaining when both parties have left-digit bias when processing numbers. The empirical analysis focuses on the auto finance market in the United States, using a large data set of 35 million auto loans. Incorporating left-digit bias in bargaining is motivated by several intriguing observations. The scheduled monthly payments of auto loans bunch at both $9- and $0-ending digits, especially over $100 marks. In addition, $9-ending loans carry a higher interest rate, and $0-ending loans have a lower interest rate. We develop a Nash bargaining model that allows for left-digit bias from both consumers and finance managers of auto dealers. Results suggest that both parties are subject to this basic human bias: the perceived difference between $9- and the next $0-ending payments is larger than $1, especially between $99- and $00-ending payments. The proposed model can explain the phenomena of payments bunching and differential interest rates for loans with different ending digits. We use counterfactuals to show a nuanced impact of left-digit bias, which can both increase and decrease the payments. Overall, bias from both sides leads to a $33 increase in average payment per loan compared with a benchmark case with no bias. This paper was accepted by Matthew Shum, marketing.

Download Full-text

The relationship between risk, capital and efficiency in Indian banking: Does ownership matter?

Journal of Financial Economic Policy ◽

10.1108/jfep-05-2018-0074 ◽

2019 ◽

Vol 11 (2) ◽

pp. 218-231

Author(s):

Sanjukta Sarkar ◽

Rudra Sensarma ◽

Dipasha Sharma

Keyword(s):

Public Sector ◽

Credit Risk ◽

Large Data ◽

Foreign Banks ◽

Risk Capital ◽

Lower Efficiency ◽

Data Set ◽

Content Type ◽

Balance Sheets ◽

Indian Banks

Purpose This paper aims to examine the interplay between risk, capital and efficiency of Indian banks and study how their relationship differs across different ownership types. Design/methodology/approach Panel regression techniques are used to analyze a large data set of all Indian scheduled commercial banks operating during the period 2008-2016. Findings The results show that lower efficiency is associated with higher credit risk in the case of public sector and old private sector banks (”bad management hypothesis”). However, higher efficiency leads to higher credit risk in the case of foreign banks (“cost skimping hypothesis”). The authors further find that the more efficient institutions among public sector hold more capital. Finally, they find that the better-capitalized banks among those in the public sector have lower risks on their balance sheets (“moral hazard hypothesis”). Originality/value There is a paucity of papers on the interplay between risk, capital and efficiency of banks in emerging economies. This paper is the first to study the inter-relationship between risk, capital and efficiency of Indian banks across ownership groups using a number of different measures of risk.

Download Full-text

Which professional skills value more under digital transformation?

Journal of Economic Studies ◽

10.1108/jes-08-2021-0432 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sofia Paklina ◽

Elena Shakina

Keyword(s):

Text Mining ◽

Labour Market ◽

Empirical Evidence ◽

Labour Force ◽

Sample Selection ◽

Large Data ◽

Hedonic Pricing ◽

Data Set ◽

Content Type ◽

Computing Skills

PurposeThis study seeks to explore the demand side of the labour market influenced by the digital revolution. It aims at identifying the new composition of skills and their value as implicitly manifested by employers when they look for the new labour force. The authors analyse the returns to computing skills based on text mining techniques applied to the job advertisements.Design/methodology/approachThe methodology is based on the hedonic pricing model with the Heckman correction to overcome the sample selection bias. The empirical part is based on a large data set that includes more than 9m online vacancies on one of the biggest job boards in Russia from 2006 to 2018.FindingsEmpirical evidence for both negative and positive returns to computing skills and their monetary values is found. Importantly, the authors also have found both complementary and substitutional effects within and between non-domain (basic) and domain (advanced) subgroups of computing skills.Originality/valueApart from the empirical evidence on the value of professional computing skills and their interrelations, this study provides the important methodological contribution on applying the hedonic procedure and text mining to the field of human resource management and labour market research.

Download Full-text

A method of locating the 3D centers of retroreflectors based on deep learning

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-09-2020-0186 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

BinBin Zhang ◽

Fumin Zhang ◽

Xinghua Qu

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Measurement Techniques ◽

Recognition Rate ◽

Detection Algorithm ◽

Large Field ◽

Small Scale ◽

Data Set ◽

Content Type ◽

Ellipse Detection

Purpose Laser-based measurement techniques offer various advantages over conventional measurement techniques, such as no-destructive, no-contact, fast and long measuring distance. In cooperative laser ranging systems, it’s crucial to extract center coordinates of retroreflectors to accomplish automatic measurement. To solve this problem, this paper aims to propose a novel method. Design/methodology/approach We propose a method using Mask RCNN (Region Convolutional Neural Network), with ResNet101 (Residual Network 101) and FPN (Feature Pyramid Network) as the backbone, to localize retroreflectors, realizing automatic recognition in different backgrounds. Compared with two other deep learning algorithms, experiments show that the recognition rate of Mask RCNN is better especially for small-scale targets. Based on this, an ellipse detection algorithm is introduced to obtain the ellipses of retroreflectors from recognized target areas. The center coordinates of retroreflectors in the camera coordinate system are obtained by using a mathematics method. Findings To verify the accuracy of this method, an experiment was carried out: the distance between two retroreflectors with a known distance of 1,000.109 mm was measured, with 2.596 mm root-mean-squar error, meeting the requirements of the coarse location of retroreflectors. Research limitations/implications The research limitations/implications are as follows: (i) As the data set only has 200 pictures, although we have used some data augmentation methods such as rotating, mirroring and cropping, there is still room for improvement in the generalization ability of detection. (ii) The ellipse detection algorithm needs to work in relatively dark conditions, as the retroreflector is made of stainless steel, which easily reflects light. Originality/value The originality/value of the article lies in being able to obtain center coordinates of multiple retroreflectors automatically even in a cluttered background; being able to recognize retroreflectors with different sizes, especially for small targets; meeting the recognition requirement of multiple targets in a large field of view and obtaining 3 D centers of targets by monocular model-based vision.

Download Full-text