ABLE: Attention Based Learning for Enzyme Classification

Mapping Intimacies ◽

10.1101/2020.11.12.380246 ◽

2020 ◽

Author(s):

Nallapareddy Mohan Vamsi ◽

Rohit Dwivedula

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Protein Data Bank ◽

Primary Structure ◽

Data Bank ◽

Statistical Testing ◽

Learning Models ◽

Proposed Model ◽

Negative Class ◽

Enzyme Class

AbstractClassifying proteins into their respective enzyme class is an interesting question for researchers for a variety of reasons. The open source Protein Data Bank (PDB) contains more than 1,60,000 structures, with more being added everyday. This paper proposes an attention-based bidirectional-LSTM model (ABLE) trained on oversampled data generated by SMOTE to analyse and classify a protein into one of the six enzyme classes or a negative class using only the primary structure of the protein described as a string by the FASTA sequence as an input. We achieve the highest F1-score of 0.834 using our proposed model on a dataset of proteins from the PDB. We baseline our model against seventeen other machine learning and deep learning models, including CNN, LSTM, BILSTM and GRU. We perform extensive experimentation and statistical testing to corroborate our results.

Download Full-text

A Tweet Sentiment Classification Approach Using a Hybrid Stacked Ensemble Technique

Information ◽

10.3390/info12090374 ◽

2021 ◽

Vol 12 (9) ◽

pp. 374

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Accuracy Score ◽

Learning Models ◽

Proposed Model

With the extensive availability of social media platforms, Twitter has become a significant tool for the acquisition of peoples’ views, opinions, attitudes, and emotions towards certain entities. Within this frame of reference, sentiment analysis of tweets has become one of the most fascinating research areas in the field of natural language processing. A variety of techniques have been devised for sentiment analysis, but there is still room for improvement where the accuracy and efficacy of the system are concerned. This study proposes a novel approach that exploits the advantages of the lexical dictionary, machine learning, and deep learning classifiers. We classified the tweets based on the sentiments extracted by TextBlob using a stacked ensemble of three long short-term memory (LSTM) as base classifiers and logistic regression (LR) as a meta classifier. The proposed model proved to be effective and time-saving since it does not require feature extraction, as LSTM extracts features without any human intervention. We also compared our proposed approach with conventional machine learning models such as logistic regression, AdaBoost, and random forest. We also included state-of-the-art deep learning models in comparison with the proposed model. Experiments were conducted on the sentiment140 dataset and were evaluated in terms of accuracy, precision, recall, and F1 Score. Empirical results showed that our proposed approach manifested state-of-the-art results by achieving an accuracy score of 99%.

Download Full-text

An Effective Phishing Detection Model Based on Character Level Convolutional Neural Network from URL

Electronics ◽

10.3390/electronics9091514 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1514

Author(s):

Ali Aljofey ◽

Qingshan Jiang ◽

Qiang Qu ◽

Mingqing Huang ◽

Jean-Pierre Niyigena

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Convolutional Neural Network ◽

Third Party ◽

Learning Models ◽

Phone Calls ◽

Learning Techniques ◽

Proposed Model ◽

Phishing Detection

Phishing is the easiest way to use cybercrime with the aim of enticing people to give accurate information such as account IDs, bank details, and passwords. This type of cyberattack is usually triggered by emails, instant messages, or phone calls. The existing anti-phishing techniques are mainly based on source code features, which require to scrape the content of web pages, and on third-party services which retard the classification process of phishing URLs. Although the machine learning techniques have lately been used to detect phishing, they require essential manual feature engineering and are not an expert at detecting emerging phishing offenses. Due to the recent rapid development of deep learning techniques, many deep learning-based methods have also been introduced to enhance the classification performance. In this paper, a fast deep learning-based solution model, which uses character-level convolutional neural network (CNN) for phishing detection based on the URL of the website, is proposed. The proposed model does not require the retrieval of target website content or the use of any third-party services. It captures information and sequential patterns of URL strings without requiring a prior knowledge about phishing, and then uses the sequential pattern features for fast classification of the actual URL. For evaluations, comparisons are provided between different traditional machine learning models and deep learning models using various feature sets such as hand-crafted, character embedding, character level TF-IDF, and character level count vectors features. According to the experiments, the proposed model achieved an accuracy of 95.02% on our dataset and an accuracy of 98.58%, 95.46%, and 95.22% on benchmark datasets which outperform the existing phishing URL models.

Download Full-text

A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia

Journal of Information Science ◽

10.1177/0165551519877646 ◽

2019 ◽

pp. 016555151987764

Author(s):

Ping Wang ◽

Xiaodan Li ◽

Renli Wu

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Complete Information ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Assessment Model ◽

Learning Models ◽

Proposed Model

Wikipedia is becoming increasingly critical in helping people obtain information and knowledge. Its leading advantage is that users can not only access information but also modify it. However, this presents a challenging issue: how can we measure the quality of a Wikipedia article? The existing approaches assess Wikipedia quality by statistical models or traditional machine learning algorithms. However, their performance is not satisfactory. Moreover, most existing models fail to extract complete information from articles, which degrades the model’s performance. In this article, we first survey related works and summarise a comprehensive feature framework. Then, state-of-the-art deep learning models are introduced and applied to assess Wikipedia quality. Finally, a comparison among deep learning models and traditional machine learning models is conducted to validate the effectiveness of the proposed model. The models are compared extensively in terms of their training and classification performance. Moreover, the importance of each feature and the importance of different feature sets are analysed separately.

Download Full-text

An Ensemble Deep Neural Network Model for Onion-Routed Traffic Detection to Boost Cloud Security

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2021010101 ◽

2021 ◽

Vol 13 (1) ◽

pp. 1-17

Author(s):

Shamik Tiwari

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Cloud Security ◽

Classification Model ◽

Ensemble Model ◽

Machine Learning Method ◽

Learning Models ◽

Anonymous Network ◽

Proposed Model ◽

Traffic Detection

Anonymous network communication using onion routing networks such as Tor are used to guard the privacy of sender by encrypting all messages in the overlapped network. These days most of the onion routed communications are not only used for decent cause but also cyber offenders are ill-using onion routings for scanning the ports, hacking, exfiltration of theft data, and other types of online frauds. These cyber-crime attempts are very vulnerable for cloud security. Deep learning is highly effective machine learning method for prediction and classification. Ensembling multiple models is an influential approach to increase the efficiency of learning models. In this work, an ensemble deep learning-based classification model is proposed to detect communication through Tor and non-Tor network. Three different deep learning models are combined to achieve the ensemble model. The proposed model is also compared with other machine learning models. Classification results shows the superiority of the proposed model than other models.

Download Full-text

Deep Convolution Neural Network Model for Credit-Card Fraud Detection and Alert

Journal of Artificial Intelligence and Capsule Networks - September 2019 ◽

10.36548/jaicn.2021.2.003 ◽

2021 ◽

Vol 3 (2) ◽

pp. 101-112

Author(s):

Joy Iong-Zong Chen ◽

Kong-Long Lai

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Credit Card ◽

Fraud Detection ◽

Financial Fraud ◽

Detection Accuracy ◽

Learning Models ◽

Proposed Model ◽

Deep Convolution Neural Network

With the exponential increase in the usage of the internet, numerous organisations, including the financial industry, have operationalized online services. The massive financial losses occur as a result of the global growth in financial fraud. Henceforth, devising advanced financial fraud detection systems can actively detect the risks such as illegal transactions and irregular attacks. Over the recent years, these issues are tackled to a larger extent by means of data mining and machine learning techniques. However, in terms of unknown attack pattern identification, big data analytics and speed computation, several improvements must be performed in these techniques. The Deep Convolution Neural Network (DCNN) scheme based financial fraud detection scheme using deep learning algorithm is proposed in this paper. When large volume of data is involved, the detection accuracy can be enhanced by using this technique. The existing machine learning models, auto-encoder model and other deep learning models are compared with the proposed model to evaluate the performance by using a real-time credit card fraud dataset. Over a time duration of 45 seconds, a detection accuracy of 99% has been obtained by using the proposed model as observed in the experimental results.

Download Full-text

An Optimized Hybrid Deep Learning Model to Detect COVID-19 Misleading Information

Computational Intelligence and Neuroscience ◽

10.1155/2021/9615034 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Bader Alouffi ◽

Abdullah Alharbi ◽

Radhya Sahal ◽

Hager Saleh

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Short Term Memory ◽

Learning Model ◽

Dense Layer ◽

Fake News ◽

Learning Models ◽

Proposed Model ◽

Deep Learning Model ◽

Machine Learning Models

Fake news is challenging to detect due to mixing accurate and inaccurate information from reliable and unreliable sources. Social media is a data source that is not trustworthy all the time, especially in the COVID-19 outbreak. During the COVID-19 epidemic, fake news is widely spread. The best way to deal with this is early detection. Accordingly, in this work, we have proposed a hybrid deep learning model that uses convolutional neural network (CNN) and long short-term memory (LSTM) to detect COVID-19 fake news. The proposed model consists of some layers: an embedding layer, a convolutional layer, a pooling layer, an LSTM layer, a flatten layer, a dense layer, and an output layer. For experimental results, three COVID-19 fake news datasets are used to evaluate six machine learning models, two deep learning models, and our proposed model. The machine learning models are DT, KNN, LR, RF, SVM, and NB, while the deep learning models are CNN and LSTM. Also, four matrices are used to validate the results: accuracy, precision, recall, and F1-measure. The conducted experiments show that the proposed model outperforms the six machine learning models and the two deep learning models. Consequently, the proposed system is capable of detecting the fake news of COVID-19 significantly.

Download Full-text

A Deep Learning BiLSTM Encoding-Decoding Model for COVID-19 Pandemic Spread Forecasting

Fractal and Fractional ◽

10.3390/fractalfract5040175 ◽

2021 ◽

Vol 5 (4) ◽

pp. 175

Author(s):

Ahmed I. Shahin ◽

Sultan Almotairi

Keyword(s):

Machine Learning ◽

Time Series ◽

Deep Learning ◽

Time Series Forecasting ◽

Machine Learning Algorithms ◽

Forecasting Model ◽

Learning Models ◽

Forecasting Models ◽

Proposed Model ◽

Death Cases

The COVID-19 pandemic has widely spread with an increasing infection rate through more than 200 countries. The governments of the world need to record the confirmed infectious, recovered, and death cases for the present state and predict the cases. In favor of future case prediction, governments can impose opening and closing procedures to save human lives by slowing down the pandemic progression spread. There are several forecasting models for pandemic time series based on statistical processing and machine learning algorithms. Deep learning has been proven as an excellent tool for time series forecasting problems. This paper proposes a deep learning time-series prediction model to forecast the confirmed, recovered, and death cases. Our proposed network is based on an encoding–decoding deep learning network. Moreover, we optimize the selection of our proposed network hyper-parameters. Our proposed forecasting model was applied in Saudi Arabia. Then, we applied the proposed model to other countries. Our study covers two categories of countries that have witnessed different spread waves this year. During our experiments, we compared our proposed model and the other time-series forecasting models, which totaled fifteen prediction models: three statistical models, three deep learning models, seven machine learning models, and one prophet model. Our proposed forecasting model accuracy was assessed using several statistical evaluation criteria. It achieved the lowest error values and achieved the highest R-squared value of 0.99. Our proposed model may help policymakers to improve the pandemic spread control, and our method can be generalized for other time series forecasting tasks.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Deep Learning in Disease Diagnosis: Models and Datasets

Current Bioinformatics ◽

10.2174/1574893615999201002124021 ◽

2020 ◽

Vol 15 ◽

Author(s):

Deeksha Saxena ◽

Mohammed Haris Siddiqui ◽

Rajnish Kumar

Keyword(s):

Biological Sciences ◽

Machine Learning ◽

Deep Learning ◽

Disease Diagnosis ◽

Learning Models ◽

Data Types ◽

Related Data ◽

Abstract Level ◽

Experimental Validations ◽

Selection Of

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.

Download Full-text

An Investigation into the Application of Deep Learning in the Detection and Mitigation of DDOS Attack on SDN Controllers

Technologies ◽

10.3390/technologies9010014 ◽

2021 ◽

Vol 9 (1) ◽

pp. 14

Author(s):

James Dzisi Gadze ◽

Akua Acheampomaa Bamfo-Asante ◽

Justice Owusu Agyemang ◽

Henry Nunoo-Mensah ◽

Kwasi Adu-Boahen Opare

Keyword(s):

Deep Learning ◽

Network Architecture ◽

Learning Algorithm ◽

Single Point ◽

Denial Of Service ◽

Specific Work ◽

Learning Models ◽

Ddos Attack ◽

Deep Learning Algorithm ◽

Proposed Model

Software-Defined Networking (SDN) is a new paradigm that revolutionizes the idea of a software-driven network through the separation of control and data planes. It addresses the problems of traditional network architecture. Nevertheless, this brilliant architecture is exposed to several security threats, e.g., the distributed denial of service (DDoS) attack, which is hard to contain in such software-based networks. The concept of a centralized controller in SDN makes it a single point of attack as well as a single point of failure. In this paper, deep learning-based models, long-short term memory (LSTM) and convolutional neural network (CNN), are investigated. It illustrates their possibility and efficiency in being used in detecting and mitigating DDoS attack. The paper focuses on TCP, UDP, and ICMP flood attacks that target the controller. The performance of the models was evaluated based on the accuracy, recall, and true negative rate. We compared the performance of the deep learning models with classical machine learning models. We further provide details on the time taken to detect and mitigate the attack. Our results show that RNN LSTM is a viable deep learning algorithm that can be applied in the detection and mitigation of DDoS in the SDN controller. Our proposed model produced an accuracy of 89.63%, which outperformed linear-based models such as SVM (86.85%) and Naive Bayes (82.61%). Although KNN, which is a linear-based model, outperformed our proposed model (achieving an accuracy of 99.4%), our proposed model provides a good trade-off between precision and recall, which makes it suitable for DDoS classification. In addition, it was realized that the split ratio of the training and testing datasets can give different results in the performance of a deep learning algorithm used in a specific work. The model achieved the best performance when a split of 70/30 was used in comparison to 80/20 and 60/40 split ratios.

Download Full-text