Compression of Deep Learning Models for Text: A Survey

Manish Gupta; Puneet Agrawal

doi:10.1145/3487045

Compression of Deep Learning Models for Text: A Survey

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3487045 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-55

Author(s):

Manish Gupta ◽

Puneet Agrawal

Keyword(s):

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Response Times ◽

Tensor Decomposition ◽

Learning Models ◽

Knowledge Distillation ◽

Gated Recurrent Units ◽

Work Done ◽

Small Models

In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanks to deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs) networks, and Transformer [ 121 ] based models like Bidirectional Encoder Representations from Transformers (BERT) [ 24 ], Generative Pre-training Transformer (GPT-2) [ 95 ], Multi-task Deep Neural Network (MT-DNN) [ 74 ], Extra-Long Network (XLNet) [ 135 ], Text-to-text transfer transformer (T5) [ 96 ], T-NLG [ 99 ], and GShard [ 64 ]. But these models are humongous in size. On the other hand, real-world applications demand small model size, low response times, and low computational power wattage. In this survey, we discuss six different types of methods (Pruning, Quantization, Knowledge Distillation (KD), Parameter Sharing, Tensor Decomposition, and Sub-quadratic Transformer-based methods) for compression of such models to enable their deployment in real industry NLP projects. Given the critical need of building applications with efficient and small models, and the large amount of recently published work in this area, we believe that this survey organizes the plethora of work done by the “deep learning for NLP” community in the past few years and presents it as a coherent story.

Download Full-text

Empirical Evaluation of Shallow and Deep Learning Classifiers for Arabic Sentiment Analysis

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3466171 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-25

Author(s):

Ali Bou Nassif ◽

Abdollah Masoud Darya ◽

Ashraf Elnagar

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Empirical Evaluation ◽

Detailed Comparison ◽

Learning Models ◽

Similar Work ◽

Learning Classifiers ◽

Arabic Sentiment Analysis ◽

Gated Recurrent Units

This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks, long short-term memory, gated recurrent units, their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning models. The performance of deep and shallow learning techniques was analyzed in terms of accuracy and F1 score. The best performing shallow learning technique was Random Forest followed by Decision Tree, and AdaBoost. The deep learning models performed similarly using a default embedding layer, while the transformer model performed best when augmented with araBERT.

Download Full-text

A Tweet Sentiment Classification Approach Using a Hybrid Stacked Ensemble Technique

Information ◽

10.3390/info12090374 ◽

2021 ◽

Vol 12 (9) ◽

pp. 374

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Accuracy Score ◽

Learning Models ◽

Proposed Model

With the extensive availability of social media platforms, Twitter has become a significant tool for the acquisition of peoples’ views, opinions, attitudes, and emotions towards certain entities. Within this frame of reference, sentiment analysis of tweets has become one of the most fascinating research areas in the field of natural language processing. A variety of techniques have been devised for sentiment analysis, but there is still room for improvement where the accuracy and efficacy of the system are concerned. This study proposes a novel approach that exploits the advantages of the lexical dictionary, machine learning, and deep learning classifiers. We classified the tweets based on the sentiments extracted by TextBlob using a stacked ensemble of three long short-term memory (LSTM) as base classifiers and logistic regression (LR) as a meta classifier. The proposed model proved to be effective and time-saving since it does not require feature extraction, as LSTM extracts features without any human intervention. We also compared our proposed approach with conventional machine learning models such as logistic regression, AdaBoost, and random forest. We also included state-of-the-art deep learning models in comparison with the proposed model. Experiments were conducted on the sentiment140 dataset and were evaluated in terms of accuracy, precision, recall, and F1 Score. Empirical results showed that our proposed approach manifested state-of-the-art results by achieving an accuracy score of 99%.

Download Full-text

Forex exchange rate forecasting using deep recurrent neural networks

Digital Finance ◽

10.1007/s42521-020-00019-x ◽

2020 ◽

Vol 2 (1-2) ◽

pp. 69-96 ◽

Cited By ~ 1

Author(s):

Alexander Jakob Dautel ◽

Wolfgang Karl Härdle ◽

Stefan Lessmann ◽

Hsin-Vonn Seow

Keyword(s):

Neural Network ◽

Deep Learning ◽

Exchange Rate ◽

Language Processing ◽

Short Term Memory ◽

Recurrent Network ◽

Network Architectures ◽

Exchange Rate Forecasting ◽

Trading Model ◽

Gated Recurrent Units

Abstract Deep learning has substantially advanced the state of the art in computer vision, natural language processing, and other fields. The paper examines the potential of deep learning for exchange rate forecasting. We systematically compare long short-term memory networks and gated recurrent units to traditional recurrent network architectures as well as feedforward networks in terms of their directional forecasting accuracy and the profitability of trading model predictions. Empirical results indicate the suitability of deep networks for exchange rate forecasting in general but also evidence the difficulty of implementing and tuning corresponding architectures. Especially with regard to trading profit, a simpler neural network may perform as well as if not better than a more complex deep neural network.

Download Full-text

Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models (Preprint)

10.2196/preprints.22982 ◽

2020 ◽

Author(s):

Xi Yang ◽

Hansi Zhang ◽

Xing He ◽

Jiang Bian ◽

Yonghui Wu

Keyword(s):

Deep Learning ◽

Family History ◽

Information Extraction ◽

Language Processing ◽

Conditional Random Fields ◽

Short Term Memory ◽

Majority Voting ◽

Learning Models ◽

Concept Extraction ◽

End To End

BACKGROUND Patients’ family history (FH) is a critical risk factor associated with numerous diseases. However, FH information is not well captured in the structured database but often documented in clinical narratives. Natural language processing (NLP) is the key technology to extract patients’ FH from clinical narratives. In 2019, the National NLP Clinical Challenge (n2c2) organized shared tasks to solicit NLP methods for FH information extraction. OBJECTIVE This study presents our end-to-end FH extraction system developed during the 2019 n2c2 open shared task as well as the new transformer-based models that we developed after the challenge. We seek to develop a machine learning–based solution for FH information extraction without task-specific rules created by hand. METHODS We developed deep learning–based systems for FH concept extraction and relation identification. We explored deep learning models including long short-term memory-conditional random fields and bidirectional encoder representations from transformers (BERT) as well as developed ensemble models using a majority voting strategy. To further optimize performance, we systematically compared 3 different strategies to use BERT output representations for relation identification. RESULTS Our system was among the top-ranked systems (3 out of 21) in the challenge. Our best system achieved micro-averaged F1 scores of 0.7944 and 0.6544 for concept extraction and relation identification, respectively. After challenge, we further explored new transformer-based models and improved the performances of both subtasks to 0.8249 and 0.6775, respectively. For relation identification, our system achieved a performance comparable to the best system (0.6810) reported in the challenge. CONCLUSIONS This study demonstrated the feasibility of utilizing deep learning methods to extract FH information from clinical narratives.

Download Full-text

ASA: A framework for Arabic sentiment analysis

Journal of Information Science ◽

10.1177/0165551519849516 ◽

2019 ◽

Vol 46 (4) ◽

pp. 544-559 ◽

Cited By ~ 4

Author(s):

Ahmed Oussous ◽

Fatima-Zahra Benjelloun ◽

Ayoub Ait Lahcen ◽

Samir Belfkih

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Short Term Memory ◽

Research Area ◽

Support Vector ◽

Learning Models ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis

Sentiment analysis (SA), also known as opinion mining, is a growing important research area. Generally, it helps to automatically determine if a text expresses a positive, negative or neutral sentiment. It enables to mine the huge increasing resources of shared opinions such as social networks, review sites and blogs. In fact, SA is used by many fields and for various languages such as English and Arabic. However, since Arabic is a highly inflectional and derivational language, it raises many challenges. In fact, SA of Arabic text should handle such complex morphology. To better handle these challenges, we decided to provide the research community and Arabic users with a new efficient framework for Arabic Sentiment Analysis (ASA). Our primary goal is to improve the performance of ASA by exploiting deep learning while varying the preprocessing techniques. For that, we implement and evaluate two deep learning models namely convolutional neural network (CNN) and long short-term memory (LSTM) models. The framework offers various preprocessing techniques for ASA (including stemming, normalisation, tokenization and stop words). As a result of this work, we first provide a new rich and publicly available Arabic corpus called Moroccan Sentiment Analysis Corpus (MSAC). Second, the proposed framework demonstrates improvement in ASA. In fact, the experimental results prove that deep learning models have a better performance for ASA than classical approaches (support vector machines, naive Bayes classifiers and maximum entropy). They also show the key role of morphological features in Arabic Natural Language Processing (NLP).

Download Full-text

Performance Analysis of Hybrid Deep Learning Models with Attention Mechanism Positioning and Focal Loss for Text Classification

Scientific Programming ◽

10.1155/2021/2420254 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Sunil Kumar Prabhakar ◽

Harikumar Rajaguru ◽

Dong-Ok Won

Keyword(s):

Deep Learning ◽

Hybrid Model ◽

Language Processing ◽

Text Classification ◽

Classification Accuracy ◽

Short Term Memory ◽

Learning Algorithm ◽

Attention Mechanism ◽

Learning Models ◽

High Classification Accuracy

Over the past few decades, text classification problems have been widely utilized in many real time applications. Leveraging the text classification methods by means of developing new applications in the field of text mining and Natural Language Processing (NLP) is very important. In order to accurately classify tasks in many applications, a deeper insight into deep learning methods is required as there is an exponential growth in the number of complex documents. The success of any deep learning algorithm depends on its capacity to understand the nonlinear relationships of the complex models within data. Thus, a huge challenge for researchers lies in the development of suitable techniques, architectures, and models for text classification. In this paper, hybrid deep learning models, with an emphasis on positioning of attention mechanism analysis, are considered and analyzed well for text classification. The first hybrid model proposed is called convolutional Bidirectional Long Short-Term Memory (Bi-LSTM) with attention mechanism and output (CBAO) model, and the second hybrid model is called convolutional attention mechanism with Bi-LSTM and output (CABO) model. In the first hybrid model, the attention mechanism is placed after the Bi-LSTM, and then the output Softmax layer is constructed. In the second hybrid model, the attention mechanism is placed after convolutional layer and followed by Bi-LSTM and the output Softmax layer. The proposed hybrid models are tested on three datasets, and the results show that when the proposed CBAO model is implemented for IMDB dataset, a high classification accuracy of 92.72% is obtained and when the proposed CABO model is implemented on the same dataset, a high classification accuracy of 90.51% is obtained.

Download Full-text

Visual Analysis of Spatiotemporal Data Predictions with Deep Learning Models

Applied Sciences ◽

10.3390/app11135853 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5853

Author(s):

Hyesook Son ◽

Seokyeon Kim ◽

Hanbyul Yeon ◽

Yejin Kim ◽

Yun Jang ◽

...

Keyword(s):

Deep Learning ◽

Visual Analysis ◽

Short Term Memory ◽

Learning Model ◽

Learning Models ◽

Visualization System ◽

Temporal Prediction ◽

Long Short Term Memory ◽

Gated Recurrent Units ◽

Deep Learning Model

The output of a deep-learning model delivers different predictions depending on the input of the deep learning model. In particular, the input characteristics might affect the output of a deep learning model. When predicting data that are measured with sensors in multiple locations, it is necessary to train a deep learning model with spatiotemporal characteristics of the data. Additionally, since not all of the data measured together result in increasing the accuracy of the deep learning model, we need to utilize the correlation characteristics between the data features. However, it is difficult to interpret the deep learning output, depending on the input characteristics. Therefore, it is necessary to analyze how the input characteristics affect prediction results to interpret deep learning models. In this paper, we propose a visualization system to analyze deep learning models with air pollution data. The proposed system visualizes the predictions according to the input characteristics. The input characteristics include space-time and data features, and we apply temporal prediction networks, including gated recurrent units (GRU), long short term memory (LSTM), and spatiotemporal prediction networks (convolutional LSTM) as deep learning models. We interpret the output according to the characteristics of input to show the effectiveness of the system.

Download Full-text

Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models

JMIR Medical Informatics ◽

10.2196/22982 ◽

2020 ◽

Vol 8 (12) ◽

pp. e22982

Author(s):

Xi Yang ◽

Hansi Zhang ◽

Xing He ◽

Jiang Bian ◽

Yonghui Wu

Keyword(s):

Deep Learning ◽

Family History ◽

Information Extraction ◽

Language Processing ◽

Conditional Random Fields ◽

Short Term Memory ◽

Majority Voting ◽

Learning Models ◽

Concept Extraction ◽

End To End

Background Patients’ family history (FH) is a critical risk factor associated with numerous diseases. However, FH information is not well captured in the structured database but often documented in clinical narratives. Natural language processing (NLP) is the key technology to extract patients’ FH from clinical narratives. In 2019, the National NLP Clinical Challenge (n2c2) organized shared tasks to solicit NLP methods for FH information extraction. Objective This study presents our end-to-end FH extraction system developed during the 2019 n2c2 open shared task as well as the new transformer-based models that we developed after the challenge. We seek to develop a machine learning–based solution for FH information extraction without task-specific rules created by hand. Methods We developed deep learning–based systems for FH concept extraction and relation identification. We explored deep learning models including long short-term memory-conditional random fields and bidirectional encoder representations from transformers (BERT) as well as developed ensemble models using a majority voting strategy. To further optimize performance, we systematically compared 3 different strategies to use BERT output representations for relation identification. Results Our system was among the top-ranked systems (3 out of 21) in the challenge. Our best system achieved micro-averaged F1 scores of 0.7944 and 0.6544 for concept extraction and relation identification, respectively. After challenge, we further explored new transformer-based models and improved the performances of both subtasks to 0.8249 and 0.6775, respectively. For relation identification, our system achieved a performance comparable to the best system (0.6810) reported in the challenge. Conclusions This study demonstrated the feasibility of utilizing deep learning methods to extract FH information from clinical narratives.

Download Full-text

Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing)

Journal of Intelligent Systems ◽

10.1515/jisys-2017-0520 ◽

2019 ◽

Vol 28 (3) ◽

pp. 423-435 ◽

Cited By ~ 3

Author(s):

S. Kumar ◽

M. Anand Kumar ◽

K.P. Soman

Keyword(s):

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Sequential Model ◽

Pos Tagging ◽

Part Of Speech ◽

Word Level ◽

Twitter Data ◽

Gated Recurrent Units ◽

Hidden States

Abstract The paper addresses the problem of part-of-speech (POS) tagging for Malayalam tweets. The conversational style of posts/tweets/text in social media data poses a challenge in using general POS tagset for tagging the text. For the current work, a tagset was designed that contains 17 coarse tags and 9915 tweets were tagged manually for experiment and evaluation. The tagged data were evaluated using sequential deep learning methods like recurrent neural network (RNN), gated recurrent units (GRU), long short-term memory (LSTM), and bidirectional LSTM (BLSTM). The training of the model was performed on the tagged tweets, at word level and character level. The experiments were evaluated using measures like precision, recall, f1-measure, and accuracy. During the experiment, it was found that the GRU-based deep learning sequential model at word level gave the highest f1-measure of 0.9254; at character-level, the BLSTM-based deep learning sequential model gave the highest f1-measure of 0.8739. To choose the suitable number of hidden states, we varied it as 4, 16, 32, and 64, and performed training for each. It was observed that the increase in hidden states improved the tagger model. This is an initial work to perform Malayalam Twitter data POS tagging using deep learning sequential models.

Download Full-text

Forecast Based Consensus Control for DC Microgrids Using Distributed Long Short-Term Memory Deep Learning Models

IEEE Transactions on Smart Grid ◽

10.1109/tsg.2021.3070959 ◽

2021 ◽

pp. 1-1

Author(s):

Seyed Amir Alavi ◽

Kamyar Mehran ◽

Vahid Vahidinasab ◽

Joao P. S. Catalao

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Learning Models ◽

Short Term ◽

Consensus Control ◽

Term Memory ◽

Dc Microgrids ◽

Long Short Term Memory

Download Full-text