scholarly journals Where to Prune: Using LSTM to Guide End-to-end Pruning

Author(s):  
Jing Zhong ◽  
Guiguang Ding ◽  
Yuchen Guo ◽  
Jungong Han ◽  
Bin Wang

Recent years have witnessed the great success of convolutional neural networks (CNNs) in many related fields. However, its huge model size and computation complexity bring in difficulty when deploying CNNs in some scenarios, like embedded system with low computation power. To address this issue, many works have been proposed to prune filters in CNNs to reduce computation. However, they mainly focus on seeking which filters are unimportant in a layer and then prune filters layer by layer or globally. In this paper, we argue that the pruning order is also very significant for model pruning. We propose a novel approach to figure out which layers should be pruned in each step.  First, we utilize a long short-term memory (LSTM) to learn the hierarchical characteristics of a network and generate a pruning decision for each layer, which is the main difference from previous works. Next, a channel-based method is adopted to evaluate the importance of filters in a to-be-pruned layer, followed by an accelerated recovery step. Experimental results demonstrate that our approach is capable of reducing 70.1% FLOPs for VGG and 47.5% for Resnet-56 with comparable accuracy. Also, the learning results seem to reveal the sensitivity of each network layer.

2021 ◽  
Author(s):  
Erdem Doğan

Abstract Intelligent transport systems need accurate short-term traffic flow forecasts. However, developing a robust short-term traffic flow forecasting approach is a challenge due to the stochastic character of traffic flow. This study proposes a novel approach for short-term traffic flow prediction task namely Robust Long Short Term Memory (R-LSTM) based on Robust Empirical Mode Decomposing (REDM) algorithm and Long Short Term Memory (LSTM). Short-term traffic flow data provided from the Caltrans Performance Measurement System (PeMS) database were used in the training and testing of the model. The dataset was composed of traffic data collected by 25 traffic detectors on different freeways’ main lanes. The time resolution of the dataset was set to 15 minutes, and the Hampel preprocessing algorithm was applied for outlier elimination. The R-LSTM predictions were compared with the state-of-art models, utilizing RMSE, MSE, and MAPE as performance criteria. Performance analyzes for various periods show that R-LSTM is remarkably successful in all time periods. Moreover, developed model performance is significantly higher, especially during mid-day periods when traffic flow fluctuations are high. These results show that R-LSTM is a strong candidate for short-term traffic flow prediction, and can easily adapt to fluctuations in traffic flow. In addition, robust models for short-term predictions can be developed by applying the signal separation method to traffic flow data.


10.29007/j35r ◽  
2020 ◽  
Author(s):  
Mostofa Ahsan ◽  
Kendall Nygard

A variety of attacks are regularly attempted at network infrastructure. With the increasing development of artificial intelligence algorithms, it has become effective to prevent network intrusion for more than two decades. Deep learning methods can achieve high accuracy with a low false alarm rate to detect network intrusions. A novel approach using a hybrid algorithm of Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) is introduced in this paper to provide improved intrusion detection. This bidirectional algorithm showed the highest known accuracy of 99.70% on a standard dataset known as NSL KDD. The performance of this algorithm is measured using precision, false positive, F1 score, and recall which found promising for deployment on live network infrastructure.


Author(s):  
Victoria Zayats ◽  
Mari Ostendorf

This paper presents a novel approach for modeling threaded discussions on social media using a graph-structured bidirectional LSTM (long-short term memory) which represents both hierarchical and temporal conversation structure. In experiments with a task of predicting popularity of comments in Reddit discussions, the proposed model outperforms a node-independent architecture for different sets of input features. Analyses show a benefit to the model over the full course of the discussion, improving detection in both early and late stages. Further, the use of language cues with the bidirectional tree state updates helps with identifying controversial comments.


Author(s):  
Tao Gui ◽  
Ruotian Ma ◽  
Qi Zhang ◽  
Lujun Zhao ◽  
Yu-Gang Jiang ◽  
...  

Character-level Chinese named entity recognition (NER) that applies long short-term memory (LSTM) to incorporate lexicons has achieved great success. However, this method fails to fully exploit GPU parallelism and candidate lexicons can conflict. In this work, we propose a faster alternative to Chinese NER: a convolutional neural network (CNN)-based method that incorporates lexicons using a rethinking mechanism. The proposed method can model all the characters and potential words that match the sentence in parallel. In addition, the rethinking mechanism can address the word conflict by feeding back the high-level features to refine the networks. Experimental results on four datasets show that the proposed method can achieve better performance than both word-level and character-level baseline methods. In addition, the proposed method performs up to 3.21 times faster than state-of-the-art methods, while realizing better performance.


Author(s):  
Felicia Lilian J. ◽  
Sundarakantham K ◽  
Mercy Shalinie S.

Question Answer (QA) System for Reading Comprehension (RC) is a computerized approach to retrieve relevant response to the query posted by the users. The underlined concept in developing such a system is to build a human computer interaction. The interactions will be in natural language and we tend to use negation words as a part of our expressions. During the pre-processing stage in Natural Language Processing (NLP) task these negation words gets removed and hence the semantics gets changed. This remains to be an unsolved problem in QA system. In order to maintain the semantics we have proposed a novel approach Hybrid NLP based Bi-directional Long Short Term Memory (Bi-LSTM) with attention mechanism. It deals with the negation words and maintains the semantics of the sentence. We also focus on answering any factoid query (i.e. ’what’, ’when’, ’where’, ’who’) that is raised by the user. For this purpose, the use of attention mechanism with softmax activation function has obtained superior results that matches the question type and process the context information effectively. The experimental results are performed over the SQuAD dataset for reading comprehension and the Stanford Negation dataset is used to perform the negation in the RC sentence. The accuracy of the system over negation is obtained as 93.9% and over the QA system is 87%.


Online media for news consumption has doubtful advantages. From one perspective, it has minimal expense, simple access, and fast dispersal of data which leads individuals to search out and devour news from online media. On the other hand, it increases the wide spread of "counterfeit news", i.e., inferior quality news with purposefully bogus data. The broad spread of fake news contrarily affects people and society. Hence, fake news detection in social media has become an emerging research topic that is drawing attention from various researchers. In past, many creators proposed the utilization of text mining procedures and AI strategies to examine textual data and helps to foresee the believability of news. With more computational capacities and to deal with enormous datasets, deep learning models present a better presentation over customary text mining strategies and AI methods. Normally deep learning model, for example, LSTM model can identify complex patterns in the data. Long short term memory is a tree organized recurrent neural network (RNN) used to examine variable length sequential information. In our proposed framework we set up a fake news identification model dependent on LSTM neural network. Openly accessible unstructured news datasets are utilized to evaluate the exhibition of the model. The outcome shows the prevalence and exactness of LSTM model over the customary techniques specifically CNN for fake news recognition.


Author(s):  
Zouhaira Noubigh ◽  
Anis Mezghani ◽  
Monji Kherallah

In recent years, Deep neural networks (DNNs) have achieved great success in sequence modeling. Several deep models have been used for enhancing Handwriting Text Recognition (HTR). Among these models, Convolutional Neural Networks (CNNs) and Recurrent Neural network especially Long-Short-Term-Memory (LSTM) networks achieve state-of-the-art recognition accuracy. The recognition methods for Arabic text lines have been widely applied in many specific tasks. However, there are still some potential challenges as the lack of available and large Arabic text recognition dataset and the characteristics of Arabic script. In order to address these challenges, we propose an end-to-end recognition method based on convolutional recurrent neural networks (CRNNs), which adds feature reuse network component on the basis of a CRNN. The model is trained and tested on two Arabic text recognition datasets named KHATT and AHTID/MW. The experimental results demonstrate that the proposed method achieves better performance than other methods in the literature.


2021 ◽  
Vol 11 (10) ◽  
pp. 4689
Author(s):  
Ngoc-Hoang Nguyen ◽  
Tran-Dac-Thinh Phan ◽  
Soo-Hyung Kim ◽  
Hyung-Jeong Yang ◽  
Guee-Sang Lee

This paper presents a novel approach to continuous dynamic hand gesture recognition. Our approach contains two main modules: gesture spotting and gesture classification. Firstly, the gesture spotting module pre-segments the video sequence with continuous gestures into isolated gestures. Secondly, the gesture classification module identifies the segmented gestures. In the gesture spotting module, the motion of the hand palm and fingers are fed into the Bidirectional Long Short-Term Memory (Bi-LSTM) network for gesture spotting. In the gesture classification module, three residual 3D Convolution Neural Networks based on ResNet architectures (3D_ResNet) and one Long Short-Term Memory (LSTM) network are combined to efficiently utilize the multiple data channels such as RGB, Optical Flow, Depth, and 3D positions of key joints. The promising performance of our approach is obtained through experiments conducted on three public datasets—Chalearn LAP ConGD dataset, 20BN-Jester, and NVIDIA Dynamic Hand gesture Dataset. Our approach outperforms the state-of-the-art methods on the Chalearn LAP ConGD dataset.


Sign in / Sign up

Export Citation Format

Share Document