Detection of phishing websites using a novel twofold ensemble model

Purpose Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of anonymous access to vulnerable details. Such attacks often result in substantial financial losses. Thus, there is a need for effective intrusion detection techniques to identify and possibly nullify the effects of phishing. Classifying phishing and non-phishing web content is a critical task in information security protocols, and full-proof mechanisms have yet to be implemented in practice. The purpose of the current study is to present an ensemble machine learning model for classifying phishing websites. Design/methodology/approach A publicly available data set comprising 10,068 instances of phishing and legitimate websites was used to build the classifier model. Feature extraction was performed by deploying a group of methods, and relevant features extracted were used for building the model. A twofold ensemble learner was developed by integrating results from random forest (RF) classifier, fed into a feedforward neural network (NN). Performance of the ensemble classifier was validated using k-fold cross-validation. The twofold ensemble learner was implemented as a user-friendly, interactive decision support system for classifying websites as phishing or legitimate ones. Findings Experimental simulations were performed to access and compare the performance of the ensemble classifiers. The statistical tests estimated that RF_NN model gave superior performance with an accuracy of 93.41 per cent and minimal mean squared error of 0.000026. Research limitations/implications The research data set used in this study is publically available and easy to analyze. Comparative analysis with other real-time data sets of recent origin must be performed to ensure generalization of the model against various security breaches. Different variants of phishing threats must be detected rather than focusing particularly toward phishing website detection. Originality/value The twofold ensemble model is not applied for classification of phishing websites in any previous studies as per the knowledge of authors.

Download Full-text

Tunicate swarm algorithm-trained multi-layered perceptron for data centre energy demand forecasting and relative percentage contribution analysis of input parameters

Journal of Engineering Design and Technology ◽

10.1108/jedt-10-2020-0436 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Oluwafemi Ajayi ◽

Reolyn Heymann

Keyword(s):

Neural Network ◽

Energy Management ◽

Energy Demand ◽

Mean Squared Error ◽

Data Set ◽

Content Type ◽

Demand Pattern ◽

The Neural Network ◽

Input Parameters ◽

Demand Profile

Purpose Energy management is critical to data centres (DCs) majorly because they are high energy-consuming facilities and demand for their services continue to rise due to rapidly increasing global demand for cloud services and other technological services. This projected sectoral growth is expected to translate into increased energy demand from the sector, which is already considered a major energy consumer unless innovative steps are used to drive effective energy management systems. The purpose of this study is to provide insights into the expected energy demand of the DC and the impact each measured parameter has on the building's energy demand profile. This serves as a basis for the design of an effective energy management system. Design/methodology/approach This study proposes novel tunicate swarm algorithm (TSA) for training an artificial neural network model used for predicting the energy demand of a DC. The objective is to find the optimal weights and biases of the model while avoiding commonly faced challenges when using the backpropagation algorithm. The model implementation is based on historical energy consumption data of an anonymous DC operator in Cape Town, South Africa. The data set provided consists of variables such as ambient temperature, ambient relative humidity, chiller output temperature and computer room air conditioning air supply temperature, which serve as inputs to the neural network that is designed to predict the DC’s hourly energy consumption for July 2020. Upon preprocessing of the data set, total sample number for each represented variable was 464. The 80:20 splitting ratio was used to divide the data set into training and testing set respectively, making 452 samples for the training set and 112 samples for the testing set. A weights-based approach has also been used to analyze the relative impact of the model’s input parameters on the DC’s energy demand pattern. Findings The performance of the proposed model has been compared with those of neural network models trained using state of the art algorithms such as moth flame optimization, whale optimization algorithm and ant lion optimizer. From analysis, it was found that the proposed TSA outperformed the other methods in training the model based on their mean squared error, root mean squared error, mean absolute error, mean absolute percentage error and prediction accuracy. Analyzing the relative percentage contribution of the model's input parameters based on the weights of the neural network also shows that the ambient temperature of the DC has the highest impact on the building’s energy demand pattern. Research limitations/implications The proposed novel model can be applied to solving other complex engineering problems such as regression and classification. The methodology for optimizing the multi-layered perceptron neural network can also be further applied to other forms of neural networks for improved performance. Practical implications Based on the forecasted energy demand of the DC and an understanding of how the input parameters impact the building's energy demand pattern, neural networks can be deployed to optimize the cooling systems of the DC for reduced energy cost. Originality/value The use of TSA for optimizing the weights and biases of a neural network is a novel study. The application context of this study which is DCs is quite untapped in the literature, leaving many gaps for further research. The proposed prediction model can be further applied to other regression tasks and classification tasks. Another contribution of this study is the analysis of the neural network's input parameters, which provides insight into the level to which each parameter influences the DC’s energy demand profile.

Download Full-text

Factors influencing the data sharing behavior of researchers in sociology and political science

Journal of Documentation ◽

10.1108/jd-09-2017-0126 ◽

2018 ◽

Vol 74 (5) ◽

pp. 1053-1073 ◽

Cited By ~ 6

Author(s):

Wolfgang Zenk-Möltgen ◽

Esra Akdeniz ◽

Alexia Katsanidou ◽

Verena Naßhoven ◽

Ebru Balaban

Keyword(s):

Political Science ◽

Data Sharing ◽

Behavioral Control ◽

Statistical Tests ◽

Data Availability ◽

Impact Factors ◽

Data Set ◽

Content Type ◽

Data Policy ◽

Sharing Behavior

Purpose Open data and data sharing should improve transparency of research. The purpose of this paper is to investigate how different institutional and individual factors affect the data sharing behavior of authors of research articles in sociology and political science. Design/methodology/approach Desktop research analyzed attributes of sociology and political science journals (n=262) from their websites. A second data set of articles (n=1,011; published 2012-2014) was derived from ten of the main journals (five from each discipline) and stated data sharing was examined. A survey of the authors used the Theory of Planned Behavior to examine motivations, behavioral control, and perceived norms for sharing data. Statistical tests (Spearman’s ρ, χ2) examined correlations and associations. Findings Although many journals have a data policy for their authors (78 percent in sociology, 44 percent in political science), only around half of the empirical articles stated that the data were available, and for only 37 percent of the articles could the data be accessed. Journals with higher impact factors, those with a stated data policy, and younger journals were more likely to offer data availability. Of the authors surveyed, 446 responded (44 percent). Statistical analysis indicated that authors’ attitudes, reported past behavior, social norms, and perceived behavioral control affected their intentions to share data. Research limitations/implications Less than 50 percent of the authors contacted provided responses to the survey. Results indicate that data sharing would improve if journals had explicit data sharing policies but authors also need support from other institutions (their universities, funding councils, and professional associations) to improve data management skills and infrastructures. Originality/value This paper builds on previous similar research in sociology and political science and explains some of the barriers to data sharing in social sciences by combining journal policies, published articles, and authors’ responses to a survey.

Download Full-text

Automated brain tumor segmentation from multimodal MRI data based on Tamura texture feature and an ensemble SVM classifier

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-04-2019-0031 ◽

2019 ◽

Vol 12 (4) ◽

pp. 466-480

Author(s):

Li Na ◽

Xiong Zhiyong ◽

Deng Tianqi ◽

Ren Kai

Keyword(s):

Brain Tumor ◽

Brain Tumors ◽

Superior Performance ◽

Support Vector ◽

Svm Classifier ◽

Data Set ◽

Original Solution ◽

Content Type ◽

Tumor Region ◽

The Brain

Purpose The precise segmentation of brain tumors is the most important and crucial step in their diagnosis and treatment. Due to the presence of noise, uneven gray levels, blurred boundaries and edema around the brain tumor region, the brain tumor image has indistinct features in the tumor region, which pose a problem for diagnostics. The paper aims to discuss these issues. Design/methodology/approach In this paper, the authors propose an original solution for segmentation using Tamura Texture and ensemble Support Vector Machine (SVM) structure. In the proposed technique, 124 features of each voxel are extracted, including Tamura texture features and grayscale features. Then, these features are ranked using the SVM-Recursive Feature Elimination method, which is also adopted to optimize the parameters of the Radial Basis Function kernel of SVMs. Finally, the bagging random sampling method is utilized to construct the ensemble SVM classifier based on a weighted voting mechanism to classify the types of voxel. Findings The experiments are conducted over a sample data set to be called BraTS2015. The experiments demonstrate that Tamura texture is very useful in the segmentation of brain tumors, especially the feature of line-likeness. The superior performance of the proposed ensemble SVM classifier is demonstrated by comparison with single SVM classifiers as well as other methods. Originality/value The authors propose an original solution for segmentation using Tamura Texture and ensemble SVM structure.

Download Full-text

KiwiSaver fund performance and asset allocation policy

Pacific Accounting Review ◽

10.1108/par-06-2018-0044 ◽

2019 ◽

Vol 31 (2) ◽

pp. 232-257

Author(s):

Huong Dieu Dang

Keyword(s):

Asset Allocation ◽

Tracking Error ◽

Excess Return ◽

Superior Performance ◽

Data Set ◽

Content Type ◽

Allocation Policy ◽

Holding Period ◽

Management Fees ◽

The Impact

Purpose This paper aims to examine the performance and benchmark asset allocation policy of 70 KiwiSaver funds catergorised as growth, balanced or conservative over the period October 2007-June 2016. The study focuses on the sources for returns variability across time and returns variation among funds. Design/methodology/approach Each fund is benchmarked against a portfolio of eight indices representing eight invested asset classes. Three measures were used to examine the after-fee benchmark-adjusted performance of each fund: excess return, cumulative abnormal return and holding period returns difference. Tracking error and active share were used to capture manager’s benchmark deviation. Findings On average, funds underperform their respective benchmarks, with the mean quarterly excess return (after management fees) of −0.15 per cent (growth), −0.63 per cent (balanced) and −0.83 per cent (conservative). Benchmark returns variability, on average, explains 43-78 per cent of fund’s across-time returns variability, and this is primarily driven by fund’s exposures to global capital markets. Differences in benchmark policies, on average, account for 18.8-39.3 per cent of among-fund returns variation, while differences in fees and security selection may explain the rest. About 61 per cent of balanced and 47 per cent of Growth funds’ managers make selection bets against their benchmarks. There is no consistent evidence that more actively managed funds deliver higher after-fee risk-adjusted performance. Superior performance is often due to randomness. Originality/value This study makes use of a unique data set gathered directly from KiwiSaver managers and captures the long-term strategic asset allocation target which underlines the investment management process in reality. The study represents the first attempt to examine the impact of benchmark asset allocation policy on KiwiSaver fund’s returns variability across time and returns variation among funds.

Download Full-text

Scoring goals in multiple fields

Sport Business and Management An International Journal ◽

10.1108/sbm-11-2016-0072 ◽

2017 ◽

Vol 7 (2) ◽

pp. 197-215 ◽

Cited By ~ 3

Author(s):

Petros Parganas ◽

Roman Liasko ◽

Christos Anagnostopoulos

Keyword(s):

Team Performance ◽

Statistical Tests ◽

Profit Maximization ◽

Economic Research ◽

Professional Football ◽

Linear Modeling ◽

Data Set ◽

Content Type ◽

Professional Team Sports ◽

Football Clubs

Purpose Professional football clubs currently strive for a number of concurrent goals, ranging from on-field success to profit maximization to fan expansion and engagement. The purpose of this paper, theoretically informed by the social penetration theory, is to analyze the economics behind such goals and examine the association between team performance, commercial success, and social media followers in professional team sports. Design/methodology/approach A data set relating to 20 European professional football clubs that combines financial (revenues and costs), sporting, and digital-reach measures for three consecutive football seasons (2013/2014 to 2015/2016) was used. In addition, to elaborate on this data in terms of a descriptive study, the study constructs a range of correlation statistical tests and linear modeling techniques to obtain quantitative results. Findings The results indicate that all the three main sources of club revenues (match-day, commercial/sponsorship, and broadcasting) are positive drivers for Facebook followers. Staff investments (staff costs) are also positively related to Facebook followers, albeit to a lesser extent, while higher-ranked clubs seem to follow a constant approach in terms of their revenues and cost structure. Originality/value This study seeks to bridge the communication and sport economic research, providing evidence that Facebook followers are part of the cyclical phenomenon of team revenues and team performance. In doing so, it initiates a debate on the relationship between the digital expansion of a football club and its sports and financial indicators.

Download Full-text

Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?

Astronomy and Astrophysics ◽

10.1051/0004-6361/202038314 ◽

2020 ◽

Vol 642 ◽

pp. A58

Author(s):

J. B. Cabral ◽

F. Ramos ◽

S. Gurovich ◽

P. M. Granitto

Keyword(s):

Machine Learning ◽

Model Selection ◽

Broad Band ◽

Ensemble Classifier ◽

Light Curves ◽

Ensemble Classifiers ◽

Data Set ◽

Rr Lyrae ◽

Selection Step ◽

Sampling Procedures

Context. The creation of a 3D map of the bulge using RR Lyrae (RRL) is one of the main goals of the VISTA Variables in the Via Lactea Survey (VVV) and VVV(X) surveys. The overwhelming number of sources undergoing analysis undoubtedly requires the use of automatic procedures. In this context, previous studies have introduced the use of machine learning (ML) methods for the task of variable star classification. Aims. Our goal is to develop and test an entirely automatic ML-based procedure for the identification of RRLs in the VVV Survey. This automatic procedure is meant to be used to generate reliable catalogs integrated over several tiles in the survey. Methods. Following the reconstruction of light curves, we extracted a set of period- and intensity-based features, which were already defined in previous works. Also, for the first time, we put a new subset of useful color features to use. We discuss in considerable detail all the appropriate steps needed to define our fully automatic pipeline, namely: the selection of quality measurements; sampling procedures; classifier setup, and model selection. Results. As a result, we were able to construct an ensemble classifier with an average recall of 0.48 and average precision of 0.86 over 15 tiles. We also made all our processed datasets available and we published a catalog of candidate RRLs. Conclusions. Perhaps most interestingly, from a classification perspective based on photometric broad-band data, our results indicate that color is an informative feature type of the RRL objective class that should always be considered in automatic classification methods via ML. We also argue that recall and precision in both tables and curves are high-quality metrics with regard to this highly imbalanced problem. Furthermore, we show for our VVV data-set that to have good estimates, it is important to use the original distribution more abundantly than reduced samples with an artificial balance. Finally, we show that the use of ensemble classifiers helps resolve the crucial model selection step and that most errors in the identification of RRLs are related to low-quality observations of some sources or to the increased difficulty in resolving the RRL-C type given the data.

Download Full-text

Ensemble Classification Approach for Sarcasm Detection

Behavioural Neurology ◽

10.1155/2021/9731519 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Jyoti Godara ◽

Isha Batra ◽

Rajni Aron ◽

Mohammad Shabaz

Keyword(s):

Logistic Regression ◽

Decision Tree ◽

Research Work ◽

Ensemble Classifier ◽

Ensemble Classification ◽

Ensemble Model ◽

Ensemble Classifiers ◽

Text Data ◽

Proposed Model ◽

Pca Algorithm

Cognitive science is a technology which focuses on analyzing the human brain using the application of DM. The databases are utilized to gather and store the large volume of data. The authenticated information is extracted using measures. This research work is based on detecting the sarcasm from the text data. This research work introduces a scheme to detect sarcasm based on PCA algorithm, K -means algorithm, and ensemble classification. The four ensemble classifiers are designed with the objective of detecting the sarcasm. The first ensemble classification algorithm (SKD) is the combination of SVM, KNN, and decision tree. In the second ensemble classifier (SLD), SVM, logistic regression, and decision tree classifiers are combined for the sarcasm detection. In the third ensemble model (MLD), MLP, logistic regression, and decision tree are combined, and the last one (SLM) is the combination of MLP, logistic regression, and SVM. The proposed model is implemented in Python and tested on five datasets of different sizes. The performance of the models is tested with regard to various metrics.

Download Full-text

An ensemble-based model for prediction of academic performance of students in undergrad professional course

Journal of Engineering Design and Technology ◽

10.1108/jedt-11-2018-0204 ◽

2019 ◽

Vol 17 (4) ◽

pp. 769-781 ◽

Cited By ~ 1

Author(s):

Preet Kamal ◽

Sachin Ahuja

Keyword(s):

At Risk ◽

Academic Performance ◽

Family Income ◽

Peer Pressure ◽

Influential Factors ◽

Ensemble Model ◽

Data Set ◽

Content Type ◽

Factors Affecting ◽

Behavioural Factors

Purpose The purpose of this paper is to develop a prediction model to study the factors affecting the academic performance of students pursuing an undergraduate professional course (BCA). For this purpose, the ensemble model of decision tree, gradient boost algorithm and Naïve Bayes techniques is created to achieve best and accurate results. Monitoring the academic performance of students has emerged as an essential field as it plays a vital role in the accurate development and growth of students’ critical and cognitive thinking. If the academic performance of students during the initial years of the graduation can be predicted, different stakeholders, i.e. government, policymakers, academicians, can be helped to make significant remedial strategies. This comprehensible practice can go a long way in shaping the ideologies of young minds, enhancing pedagogical practices and reframing of curriculum. This study aims to develop positive steps that can be taken to enhance future endeavours in the field of education. Design/methodology/approach A questionnaire was prepared specifically to find out influential factors affecting the academic performance of the students. Its specific area of investigation was demographic, social, academic and behavioural factors that influence the performance of the students. Then, an ensemble model was built using three techniques based on accuracy rate. A 10-fold cross-validation technique was applied to access the fitness of results obtained from proposed ensemble model. Findings The result obtained from ensemble model provides efficient and accurate prediction of student performance and helps identify the students that are at risk of failing or being a drop-out. The effect of previous semester’s academic performance shows a significant impact on current academic performance along with other factors (such as number of siblings and distance of university from residence). Any major mishap during past one year also affects the academic performance along with habit-based behavioural factors such as consumption of alcohol and tobacco. Research limitations/implications Though the existing model considers aspects related to a student’s family income and academic indicators, it tends to ignore major factors such as influence of peer pressure, self-study habits and time devoted to study after college hours. An attempt is made in this paper to examine the above cited factors in predicting the academic performance of the students. The need of the hour is to develop innovative models to assess and make advancements in the present educational set-up. The ensemble model is best suited to study all factors needed to accomplish a robust and reliable model. Originality\value The present model is developed using classification and regression algorithms. The model is able to achieve 99 per cent accuracy with the existing data set and is able to identify the influential factors affecting the academic performance. As early detection of at-risk students is possible with the proposed model, preventive and corrective measures can be proposed for improving the overall academic performance of the students.

Download Full-text

Hyperparameter tuning of AdaBoost algorithm for social spammer identification

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-09-2020-0130 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Krithiga R. ◽

Ilavarasan E.

Keyword(s):

Social Networks ◽

Optimization Algorithm ◽

Online Social Networks ◽

Whale Optimization Algorithm ◽

Ensemble Classifiers ◽

Classification Problems ◽

Data Set ◽

Content Type ◽

Adaboost Algorithm ◽

Whale Optimization

Purpose The purpose of this paper is to enhance the performance of spammer identification problem in online social networks. Hyperparameter tuning has been performed by researchers in the past to enhance the performance of classifiers. The AdaBoost algorithm belongs to a class of ensemble classifiers and is widely applied in binary classification problems. A single algorithm may not yield accurate results. However, an ensemble of classifiers built from multiple models has been successfully applied to solve many classification tasks. The search space to find an optimal set of parametric values is vast and so enumerating all possible combinations is not feasible. Hence, a hybrid modified whale optimization algorithm for spam profile detection (MWOA-SPD) model is proposed to find optimal values for these parameters. Design/methodology/approach In this work, the hyperparameters of AdaBoost are fine-tuned to find its application to identify spammers in social networks. AdaBoost algorithm linearly combines several weak classifiers to produce a stronger one. The proposed MWOA-SPD model hybridizes the whale optimization algorithm and salp swarm algorithm. Findings The technique is applied to a manually constructed Twitter data set. It is compared with the existing optimization and hyperparameter tuning methods. The results indicate that the proposed method outperforms the existing techniques in terms of accuracy and computational efficiency. Originality/value The proposed method reduces the server load by excluding complex features retaining only the lightweight features. It aids in identifying the spammers at an earlier stage thereby offering users a propitious environment.

Download Full-text

Entrepreneurial orientation and firm performance in different environmental settings

Journal of Small Business and Enterprise Development ◽

10.1108/jsbed-09-2015-0132 ◽

2016 ◽

Vol 23 (3) ◽

pp. 703-727 ◽

Cited By ~ 43

Author(s):

Galina Shirokova ◽

Karina Bogatyreva ◽

Tatiana Beliaeva ◽

Sheila Puffer

Keyword(s):

Firm Performance ◽

Entrepreneurial Orientation ◽

Environmental Variable ◽

Superior Performance ◽

Cross Sectional ◽

Data Set ◽

Content Type ◽

Market Growth ◽

Environmental Hostility ◽

Different Levels

Purpose – The purpose of this paper is to explore the relationship between entrepreneurial orientation (EO) and firm performance across different levels of environmental hostility and market growth. The contingency approach of two-way interactions of EO with each environmental variable is contrasted with the configurational approach of three-way interactions of EO simultaneously with different levels of both environmental variables. Design/methodology/approach – Hierarchical regression analysis is applied for the pooled data set of 163 Finnish and Russian small- and medium-sized enterprises, and supplemented with post hoc analysis of the differences in regression slopes across environmental configurations. Findings – Results show that EO is directly and positively associated with firm performance. However, the strength and direction of this relationship varies by configurations of the external environment variables. Firms achieve superior performance when adopting EO in environments with high levels of both hostility and market growth. In contrast, in favorable environments with low hostility and high market growth, EO adoption leads to lower firm performance. Research limitations/implications – The study contributes to the EO literature by demonstrating different effects of EO on firm performance across various environmental configurations. It uses cross-sectional data from two countries. Replication studies using different samples may further corroborate the results. Practical implications – In order to take advantage of opportunities and achieve better performance, managers of firms should analyze multiple elements of the environment concurrently and align EO to those conditions. Originality/value – The configurations of environmental hostility and market growth, representing both favorable and unfavorable elements of business context, have not been previously investigated together in one model of the EO-performance relationship.

Download Full-text