Efficient and effective training of sparse recurrent neural networks

AbstractRecurrent neural networks (RNNs) have achieved state-of-the-art performances on various applications. However, RNNs are prone to be memory-bandwidth limited in practical applications and need both long periods of training and inference time. The aforementioned problems are at odds with training and deploying RNNs on resource-limited devices where the memory and floating-point operations (FLOPs) budget are strictly constrained. To address this problem, conventional model compression techniques usually focus on reducing inference costs, operating on a costly pre-trained model. Recently, dynamic sparse training has been proposed to accelerate the training process by directly training sparse neural networks from scratch. However, previous sparse training techniques are mainly designed for convolutional neural networks and multi-layer perceptron. In this paper, we introduce a method to train intrinsically sparse RNN models with a fixed number of parameters and floating-point operations (FLOPs) during training. We demonstrate state-of-the-art sparse performance with long short-term memory and recurrent highway networks on widely used tasks, language modeling, and text classification. We simply use the results to advocate that, contrary to the general belief that training a sparse neural network from scratch leads to worse performance than dense networks, sparse training with adaptive connectivity can usually achieve better performance than dense models for RNNs.

Download Full-text

An Efficient Hybrid Mechanism with LSTM Neural Networks in Application to Stock Price Forecasting

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200589 ◽

2020 ◽

Author(s):

Ngoc-An Nguyen-Pham ◽

Trung T. Nguyen

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Stock Price ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Practical Applications ◽

Hybrid Mechanism ◽

Dual Stage ◽

Long Short Term Memory

Recurrent neural networks (RNN) and long short-term memory (LSTM) neural networks have shown some success with many practical applications in recent years such as machine translation, speech recognition, image processing and financial market forecasting. In recent years, a dual-stage attention-based recurrent neural network (DA-RNN) has shown some promising results on stock price prediction. We propose dual attention-dilated long short-term memory (DAD-LSTM) models combining DA-RNN and dilated recurrent neural networks (DRNN) to select the most relevant input features and capture the long-term temporal dependencies of a time series more efficiently. Numerical results from experiments on the NASDAQ 100, S&P 500, HSI and DJIA datasets show that DAD-LSTM models outperform the state-of-the-art and most recent approaches.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement

Biomimetics ◽

10.3390/biomimetics5010001 ◽

2019 ◽

Vol 5 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Michelle Gutiérrez-Muñoz ◽

Astryd González-Salazar ◽

Marvin Coto-Jiménez

Keyword(s):

Neural Networks ◽

Short Term Memory ◽

Computational Cost ◽

Real Life ◽

Fixed Number ◽

Training Procedure ◽

Statistical Validation ◽

Significant Drop ◽

Training Time ◽

Important Solution

Speech signals are degraded in real-life environments, as a product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions. To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combinations of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation was made based on quality measurements of the signal’s spectrum, the training time of the networks, and statistical validation of results. In total, 120 artificial neural networks of eight different types were trained and compared. The results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, given that reduction in training time is on the order of 30%, in processes that can normally take several days or weeks, depending on the amount of data. The results also present advantages in efficiency, but without a significant drop in quality.

Download Full-text

ECT-LSTM-RNN: An Electrical Capacitance Tomography Model-based Long Short-Term Memory Recurrent Neural Networks for Conductive Materials

IEEE Access ◽

10.1109/access.2021.3079447 ◽

2021 ◽

pp. 1-1

Author(s):

Wael Deabes ◽

Alaa Sheta ◽

Malik Braik

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Electrical Capacitance Tomography ◽

Electrical Capacitance ◽

Short Term ◽

Term Memory ◽

Conductive Materials ◽

Long Short Term Memory ◽

Capacitance Tomography

Download Full-text

Empirical modeling of ethanol production dynamics using long short-term memory recurrent neural networks

Bioresource Technology Reports ◽

10.1016/j.biteb.2021.100724 ◽

2021 ◽

pp. 100724

Author(s):

Felipe Matheus Mota Sousa ◽

Rodolpho Rodrigues Fonseca ◽

Flávio Vasconcelos da Silva

Keyword(s):

Neural Networks ◽

Ethanol Production ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Empirical Modeling ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Production Dynamics

Download Full-text

Sensuator: A Hybrid Sensor–Actuator Approach to Soft Robotic Proprioception Using Recurrent Neural Networks

Actuators ◽

10.3390/act10020030 ◽

2021 ◽

Vol 10 (2) ◽

pp. 30

Author(s):

Pornthep Preechayasomboon ◽

Eric Rombokas

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Linear Models ◽

Open Loop ◽

Proof Of Concept ◽

State Estimator ◽

Loop Control ◽

Practical Applications ◽

Soft Actuator ◽

The Cost

Soft robotic actuators are now being used in practical applications; however, they are often limited to open-loop control that relies on the inherent compliance of the actuator. Achieving human-like manipulation and grasping with soft robotic actuators requires at least some form of sensing, which often comes at the cost of complex fabrication and purposefully built sensor structures. In this paper, we utilize the actuating fluid itself as a sensing medium to achieve high-fidelity proprioception in a soft actuator. As our sensors are somewhat unstructured, their readings are difficult to interpret using linear models. We therefore present a proof of concept of a method for deriving the pose of the soft actuator using recurrent neural networks. We present the experimental setup and our learned state estimator to show that our method is viable for achieving proprioception and is also robust to common sensor failures.

Download Full-text

Fake News Detection Using Recurrent Neural Networks and Distributed Representations for the Portuguese Language

10.21528/cbic2021-163 ◽

2021 ◽

Author(s):

Guilherme Zanini Moreira ◽

Marcelo Romero ◽

Manassés Ribeiro

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Classification Performance ◽

Fake News ◽

Distributed Representations ◽

Memory Network ◽

Long Short Term Memory ◽

Media Channels ◽

Damage Rule

After the advent of Web, the number of people who abandoned traditional media channels and started receiving news only through social media has increased. However, this caused an increase of the spread of fake news due to the ease of sharing information. The consequences are various, with one of the main ones being the possible attempts to manipulate public opinion for elections or promotion of movements that can damage rule of law or the institutions that represent it. The objective of this work is to perform fake news detection using Distributed Representations and Recurrent Neural Networks (RNNs). Although fake news detection using RNNs has been already explored in the literature, there is little research on the processing of texts in Portuguese language, which is the focus of this work. For this purpose, distributed representations from texts are generated with three different algorithms (fastText, GloVe and word2vec) and used as input features for a Long Short-term Memory Network (LSTM). The approach is evaluated using a publicly available labelled news dataset. The proposed approach shows promising results for all the three distributed representation methods for feature extraction, with the combination word2vec+LSTM providing the best results. The results of the proposed approach shows a better classification performance when compared to simple architectures, while similar results are obtained when the approach is compared to deeper architectures or more complex methods.

Download Full-text