Hybrid Rule-Based Solution for Phishing URL Detection Using Convolutional Neural Network

Wireless Communications and Mobile Computing ◽

10.1155/2021/8241104 ◽

2021 ◽

Vol 2021 ◽

pp. 1-24

Author(s):

Youness Mourtaji ◽

Mohammed Bouhorma ◽

Daniyal Alghazzawi ◽

Ghadah Aldabbagh ◽

Abdullah Alghamdi

Keyword(s):

Deep Learning ◽

High Efficiency ◽

Personal Information ◽

Support Vector ◽

Learning Models ◽

Rule Based ◽

Common Purpose ◽

Behavioral Method ◽

And Control ◽

Similarity Method

The phenomenon of phishing has now been a common threat, since many individuals and webpages have been observed to be attacked by phishers. The common purpose of phishing activities is to obtain user’s personal information for illegitimate usage. Considering the growing intensity of the issue, this study is aimed at developing a new hybrid rule-based solution by incorporating six different algorithm models that may efficiently detect and control the phishing issue. The study incorporates 37 features extracted from six different methods including the black listed method, lexical and host method, content method, identity method, identity similarity method, visual similarity method, and behavioral method. Furthermore, comparative analysis was undertaken between different machine learning and deep learning models which includes CART (decision trees), SVM (support vector machines), or KNN ( K -nearest neighbors) and deep learning models such as MLP (multilayer perceptron) and CNN (convolutional neural networks). Findings of the study indicated that the method was effective in analysing the URL stress through different viewpoints, leading towards the validity of the model. However, the highest accuracy level was obtained for deep learning with the given values of 97.945 for the CNN model and 93.216 for the MLP model, respectively. The study therefore concludes that the new hybrid solution must be implemented at a practical level to reduce phishing activities, due to its high efficiency and accuracy.

Get full-text (via PubEx)

Product Review Ranking in e-Commerce using Urgency Level Classification Approach

Jurnal Online Informatika ◽

10.15575/join.v5i2.612 ◽

2020 ◽

Vol 5 (2) ◽

pp. 212

Author(s):

Hamdi Ahmad Zuhri ◽

Nur Ulfa Maulidevi

Keyword(s):

Deep Learning ◽

Classification Model ◽

Support Vector ◽

Learning Models ◽

Classification Approach ◽

Value Range ◽

High Bias ◽

Product Domains ◽

Urgency Level ◽

Bayesian Support

Review ranking is useful to give users a better experience. Review ranking studies commonly use upvote value, which does not represent urgency, and it causes problems in prediction. In contrast, manual labeling as wide as the upvote value range provides a high bias and inconsistency. The proposed solution is to use a classification approach to rank the review where the labels are ordinal urgency class. The experiment involved shallow learning models (Logistic Regression, Naïve Bayesian, Support Vector Machine, and Random Forest), and deep learning models (LSTM and CNN). In constructing a classification model, the problem is broken down into several binary classifications that predict tendencies of urgency depending on the separation of classes. The result shows that deep learning models outperform other models in classification dan ranking evaluation. In addition, the review data used tend to contain vocabulary of certain product domains, so further research is needed on data with more diverse vocabulary.

Get full-text (via PubEx)

Efficient Deep Learning Models for DGA Domain Detection

Security and Communication Networks ◽

10.1155/2021/8887881 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Juhong Namgung ◽

Siwoon Son ◽

Yang-Sae Moon

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Ensemble Model ◽

Learning Models ◽

Short Term ◽

Domain Names ◽

Additional Information ◽

Domain Sequence ◽

Long Short Term Memory ◽

And Control

In recent years, cyberattacks using command and control (C&C) servers have significantly increased. To hide their C&C servers, attackers often use a domain generation algorithm (DGA), which automatically generates domain names for the C&C servers. Accordingly, extensive research on DGA domain detection has been conducted. However, existing methods cannot accurately detect continuously generated DGA domains and can easily be evaded by an attacker. Recently, long short-term memory- (LSTM-) based deep learning models have been introduced to detect DGA domains in real time using only domain names without feature extraction or additional information. In this paper, we propose an efficient DGA domain detection method based on bidirectional LSTM (BiLSTM), which learns bidirectional information as opposed to unidirectional information learned by LSTM. We further maximize the detection performance with a convolutional neural network (CNN) + BiLSTM ensemble model using Attention mechanism, which allows the model to learn both local and global information in a domain sequence. Experimental results show that existing CNN and LSTM models achieved F1-scores of 0.9384 and 0.9597, respectively, while the proposed BiLSTM and ensemble models achieved higher F1-scores of 0.9618 and 0.9666, respectively. In addition, the ensemble model achieved the best performance for most DGA domain classes, enabling more accurate DGA domain detection than existing models.

Get full-text (via PubEx)

Short-Term Forecasting of Photovoltaic Solar Power Production Using Variational Auto-Encoder Driven Deep Learning Approach

Applied Sciences ◽

10.3390/app10238400 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8400 ◽

Cited By ~ 1

Author(s):

Abdelkader Dairi ◽

Fouzi Harrou ◽

Ying Sun ◽

Sofiane Khadraoui

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Solar Power ◽

Power Production ◽

Superior Performance ◽

Support Vector ◽

Learning Models ◽

Short Term ◽

Learning Methods ◽

Short Term Forecasting

The accurate modeling and forecasting of the power output of photovoltaic (PV) systems are critical to efficiently managing their integration in smart grids, delivery, and storage. This paper intends to provide efficient short-term forecasting of solar power production using Variational AutoEncoder (VAE) model. Adopting the VAE-driven deep learning model is expected to improve forecasting accuracy because of its suitable performance in time-series modeling and flexible nonlinear approximation. Both single- and multi-step-ahead forecasts are investigated in this work. Data from two grid-connected plants (a 243 kW parking lot canopy array in the US and a 9 MW PV system in Algeria) are employed to show the investigated deep learning models’ performance. Specifically, the forecasting outputs of the proposed VAE-based forecasting method have been compared with seven deep learning methods, namely recurrent neural network, Long short-term memory (LSTM), Bidirectional LSTM, Convolutional LSTM network, Gated recurrent units, stacked autoencoder, and restricted Boltzmann machine, and two commonly used machine learning methods, namely logistic regression and support vector regression. The results of this investigation demonstrate the satisfying performance of deep learning techniques to forecast solar power and point out that the VAE consistently performed better than the other methods. Also, results confirmed the superior performance of deep learning models compared to the two considered baseline machine learning models.

Get full-text (via PubEx)

The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification

10.3233/faia210317 ◽

2021 ◽

Author(s):

Benjamin Clavié ◽

Marc Alphonsus

Keyword(s):

Deep Learning ◽

Language Processing ◽

Text Classification ◽

Traditional Approach ◽

Error Reduction ◽

Support Vector ◽

Learning Models ◽

Legal Text ◽

Classification Tasks ◽

Legal Domain

We aim to highlight an interesting trend to contribute to the ongoing debate around advances within legal Natural Language Processing. Recently, the focus for most legal text classification tasks has shifted towards large pre-trained deep learning models such as BERT. In this paper, we show that a more traditional approach based on Support Vector Machine classifiers reaches competitive performance with deep learning models. We also highlight that error reduction obtained by using specialised BERT-based models over baselines is noticeably smaller in the legal domain when compared to general language tasks. We discuss some hypotheses for these results to support future discussions.

Get full-text (via PubEx)

Evaluating Deep Learning models for predicting ALK-5 inhibition

PLoS ONE ◽

10.1371/journal.pone.0246126 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0246126

Author(s):

Gabriel Z. Espinoza ◽

Rafaela M. Angelo ◽

Patricia R. Oliveira ◽

Kathia M. Honorio

Keyword(s):

Neural Network ◽

Biological Activity ◽

Deep Learning ◽

Deep Neural Network ◽

External Validation ◽

Machine Learning Techniques ◽

Coefficient Of Determination ◽

Support Vector ◽

Learning Models ◽

Alk 5

Computational methods have been widely used in drug design. The recent developments in machine learning techniques and the ever-growing chemical and biological databases are fertile ground for discoveries in this area. In this study, we evaluated the performance of Deep Learning models in comparison to Random Forest, and Support Vector Regression for predicting the biological activity (pIC50) of ALK-5 inhibitors as candidates to treat cancer. The generalization power of the models was assessed by internal and external validation procedures. A deep neural network model obtained the best performance in this comparative study, achieving a coefficient of determination of 0.658 on the external validation set with mean square error and mean absolute error of 0.373 and 0.450, respectively. Additionally, the relevance of the chemical descriptors for the prediction of biological activity was estimated using Permutation Importance. We can conclude that the forecast model obtained by the deep neural network is suitable for the problem and can be employed to predict the biological activity of new ALK-5 inhibitors.

Get full-text (via PubEx)

Sentiment Analysis and Topic Modeling on Tweets about Online Education during COVID-19

Applied Sciences ◽

10.3390/app11188438 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8438

Author(s):

Muhammad Mujahid ◽

Ernesto Lee ◽

Furqan Rustam ◽

Patrick Bernard Washington ◽

Saleem Ullah ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Online Education ◽

Sentiment Analysis ◽

Topic Modeling ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

E Learning ◽

Machine Learning Models

Amid the worldwide COVID-19 pandemic lockdowns, the closure of educational institutes leads to an unprecedented rise in online learning. For limiting the impact of COVID-19 and obstructing its widespread, educational institutions closed their campuses immediately and academic activities are moved to e-learning platforms. The effectiveness of e-learning is a critical concern for both students and parents, specifically in terms of its suitability to students and teachers and its technical feasibility with respect to different social scenarios. Such concerns must be reviewed from several aspects before e-learning can be adopted at such a larger scale. This study endeavors to investigate the effectiveness of e-learning by analyzing the sentiments of people about e-learning. Due to the rise of social media as an important mode of communication recently, people’s views can be found on platforms such as Twitter, Instagram, Facebook, etc. This study uses a Twitter dataset containing 17,155 tweets about e-learning. Machine learning and deep learning approaches have shown their suitability, capability, and potential for image processing, object detection, and natural language processing tasks and text analysis is no exception. Machine learning approaches have been largely used both for annotation and text and sentiment analysis. Keeping in view the adequacy and efficacy of machine learning models, this study adopts TextBlob, VADER (Valence Aware Dictionary for Sentiment Reasoning), and SentiWordNet to analyze the polarity and subjectivity score of tweets’ text. Furthermore, bearing in mind the fact that machine learning models display high classification accuracy, various machine learning models have been used for sentiment classification. Two feature extraction techniques, TF-IDF (Term Frequency-Inverse Document Frequency) and BoW (Bag of Words) have been used to effectively build and evaluate the models. All the models have been evaluated in terms of various important performance metrics such as accuracy, precision, recall, and F1 score. The results reveal that the random forest and support vector machine classifier achieve the highest accuracy of 0.95 when used with Bow features. Performance comparison is carried out for results of TextBlob, VADER, and SentiWordNet, as well as classification results of machine learning models and deep learning models such as CNN (Convolutional Neural Network), LSTM (Long Short Term Memory), CNN-LSTM, and Bi-LSTM (Bidirectional-LSTM). Additionally, topic modeling is performed to find the problems associated with e-learning which indicates that uncertainty of campus opening date, children’s disabilities to grasp online education, and lagging efficient networks for online education are the top three problems.

Get full-text (via PubEx)

Improving Accuracy of Tomato Plant Disease Diagnosis Based on Deep Learning With Explicit Control of Hidden Classes

Frontiers in Plant Science ◽

10.3389/fpls.2021.682230 ◽

2021 ◽

Vol 12 ◽

Author(s):

Alvaro Fuentes ◽

Sook Yoon ◽

Mun Haeng Lee ◽

Dong Sun Park

Keyword(s):

Deep Learning ◽

Tomato Plant ◽

Plant Disease ◽

High Efficiency ◽

Recognition Rate ◽

Disease Diagnosis ◽

Plant Diseases ◽

Improve Model ◽

Improving Accuracy ◽

And Control

Recognizing plant diseases is a major challenge in agriculture, and recent works based on deep learning have shown high efficiency in addressing problems directly related to this area. Nonetheless, weak performance has been observed when a model trained on a particular dataset is evaluated in new greenhouse environments. Therefore, in this work, we take a step towards these issues and present a strategy to improve model accuracy by applying techniques that can help refine the model’s generalization capability to deal with complex changes in new greenhouse environments. We propose a paradigm called “control to target classes.” The core of our approach is to train and validate a deep learning-based detector using target and control classes on images collected in various greenhouses. Then, we apply the generated features for testing the inference of the system on data from new greenhouse conditions where the goal is to detect target classes exclusively. Therefore, by having explicit control over inter- and intra-class variations, our model can distinguish data variations that make the system more robust when applied to new scenarios. Experiments demonstrate the effectiveness and efficiency of the proposed approach on our extended tomato plant diseases dataset with 14 classes, from which 5 are target classes and the rest are control classes. Our detector achieves a recognition rate of target classes of 93.37% mean average precision on the inference dataset. Finally, we believe that our study offers valuable guidelines for researchers working in plant disease recognition with complex input data.

Get full-text (via PubEx)

Quantifying Seagrass Distribution in Coastal Water with Deep Learning Models

Remote Sensing ◽

10.3390/rs12101581 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1581 ◽

Cited By ~ 2

Author(s):

Daniel Perez ◽

Kazi Islam ◽

Victoria Hill ◽

Richard Zimmerman ◽

Blake Schaeffer ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Transfer Learning ◽

Satellite Images ◽

Support Vector ◽

Learning Approach ◽

Learning Models ◽

Learning Techniques ◽

The World ◽

New Locations

Coastal ecosystems are critically affected by seagrass, both economically and ecologically. However, reliable seagrass distribution information is lacking in nearly all parts of the world because of the excessive costs associated with its assessment. In this paper, we develop two deep learning models for automatic seagrass distribution quantification based on 8-band satellite imagery. Specifically, we implemented a deep capsule network (DCN) and a deep convolutional neural network (CNN) to assess seagrass distribution through regression. The DCN model first determines whether seagrass is presented in the image through classification. Second, if seagrass is presented in the image, it quantifies the seagrass through regression. During training, the regression and classification modules are jointly optimized to achieve end-to-end learning. The CNN model is strictly trained for regression in seagrass and non-seagrass patches. In addition, we propose a transfer learning approach to transfer knowledge in the trained deep models at one location to perform seagrass quantification at a different location. We evaluate the proposed methods in three WorldView-2 satellite images taken from the coastal area in Florida. Experimental results show that the proposed deep DCN and CNN models performed similarly and achieved much better results than a linear regression model and a support vector machine. We also demonstrate that using transfer learning techniques for the quantification of seagrass significantly improved the results as compared to directly applying the deep models to new locations.

Get full-text (via PubEx)

Explaining Deep Learning Models Through Rule-Based Approximation and Visualization

IEEE Transactions on Fuzzy Systems ◽

10.1109/tfuzz.2020.2999776 ◽

2020 ◽

pp. 1-1

Author(s):

Eduardo Almeida Soares ◽

Plamen P Angelov ◽

Bruno Costa ◽

Marcos Castro ◽

Subramanya Nageshrao ◽

...

Keyword(s):

Deep Learning ◽

Learning Models ◽

Rule Based

Get full-text (via PubEx)

Short-Term Firm-Level Energy-Consumption Forecasting for Energy-Intensive Manufacturing: A Comparison of Machine Learning and Deep Learning Models

Algorithms ◽

10.3390/a13110274 ◽

2020 ◽

Vol 13 (11) ◽

pp. 274 ◽

Cited By ~ 1

Author(s):

Andrea Maria N. C. Ribeiro ◽

Pedro Rafael X. do Carmo ◽

Iago Richard Rodrigues ◽

Djamel Sadok ◽

Theo Lynn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Energy Consumption ◽

Mitigation Measures ◽

Support Vector ◽

Level Energy ◽

Learning Models ◽

Short Term ◽

Case Site

To minimise environmental impact, to avoid regulatory penalties, and to improve competitiveness, energy-intensive manufacturing firms require accurate forecasts of their energy consumption so that precautionary and mitigation measures can be taken. Deep learning is widely touted as a superior analytical technique to traditional artificial neural networks, machine learning, and other classical time-series models due to its high dimensionality and problem-solving capabilities. Despite this, research on its application in demand-side energy forecasting is limited. We compare two benchmarks (Autoregressive Integrated Moving Average (ARIMA) and an existing manual technique used at the case site) against three deep-learning models (simple Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU)) and two machine-learning models (Support Vector Regression (SVR) and Random Forest) for short-term load forecasting (STLF) using data from a Brazilian thermoplastic resin manufacturing plant. We use the grid search method to identify the best configurations for each model and then use Diebold–Mariano testing to confirm the results. The results suggests that the legacy approach used at the case site is the worst performing and that the GRU model outperformed all other models tested.

Get full-text (via PubEx)