Greedy Transition-Based Dependency Parsing with Stack LSTMs

We introduce a greedy transition-based parser that learns to represent parser states using recurrent neural networks. Our primary innovation that enables us to do this efficiently is a new control structure for sequential neural networks—the stack long short-term memory unit (LSTM). Like the conventional stack data structures used in transition-based parsers, elements can be pushed to or popped from the top of the stack in constant time, but, in addition, an LSTM maintains a continuous space embedding of the stack contents. Our model captures three facets of the parser's state: (i) unbounded look-ahead into the buffer of incoming words, (ii) the complete history of transition actions taken by the parser, and (iii) the complete contents of the stack of partially built tree fragments, including their internal structures. In addition, we compare two different word representations: (i) standard word vectors based on look-up tables and (ii) character-based models of words. Although standard word embedding models work well in all languages, the character-based models improve the handling of out-of-vocabulary words, particularly in morphologically rich languages. Finally, we discuss the use of dynamic oracles in training the parser. During training, dynamic oracles alternate between sampling parser states from the training data and from the model as it is being learned, making the model more robust to the kinds of errors that will be made at test time. Training our model with dynamic oracles yields a linear-time greedy parser with very competitive performance.

Download Full-text

Graph-Free Knowledge Distillation for Graph Neural Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/320 ◽

2021 ◽

Author(s):

Xiang Deng ◽

Zhongfei Zhang

Keyword(s):

Neural Networks ◽

Back Propagation ◽

Multinomial Distribution ◽

Large Data ◽

Training Data ◽

Grid Data ◽

Continuous Space ◽

Graph Data ◽

Knowledge Distillation ◽

Graph Neural Networks

Knowledge distillation (KD) transfers knowledge from a teacher network to a student by enforcing the student to mimic the outputs of the pretrained teacher on training data. However, data samples are not always accessible in many cases due to large data sizes, privacy, or confidentiality. Many efforts have been made on addressing this problem for convolutional neural networks (CNNs) whose inputs lie in a grid domain within a continuous space such as images and videos, but largely overlook graph neural networks (GNNs) that handle non-grid data with different topology structures within a discrete space. The inherent differences between their inputs make these CNN-based approaches not applicable to GNNs. In this paper, we propose to our best knowledge the first dedicated approach to distilling knowledge from a GNN without graph data. The proposed graph-free KD (GFKD) learns graph topology structures for knowledge transfer by modeling them with multinomial distribution. We then introduce a gradient estimator to optimize this framework. Essentially, the gradients w.r.t. graph structures are obtained by only using GNN forward-propagation without back-propagation, which means that GFKD is compatible with modern GNN libraries such as DGL and Geometric. Moreover, we provide the strategies for handling different types of prior knowledge in the graph data or the GNNs. Extensive experiments demonstrate that GFKD achieves the state-of-the-art performance for distilling knowledge from GNNs without training data.

Download Full-text

Battery State-of-Health Estimation Using Machine Learning and Preprocessing with Relative State-of-Charge

Energies ◽

10.3390/en14217206 ◽

2021 ◽

Vol 14 (21) ◽

pp. 7206

Author(s):

Sungwoo Jo ◽

Sunkyu Jung ◽

Taemoon Roh

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Time Domain ◽

Short Term Memory ◽

Lithium Ion ◽

State Of Charge ◽

Training Data ◽

State Of Health ◽

Relative State ◽

Time Domain Data

Because lithium-ion batteries are widely used for various purposes, it is important to estimate their state of health (SOH) to ensure their efficiency and safety. Despite the usefulness of model-based methods for SOH estimation, the difficulties of battery modeling have resulted in a greater emphasis on machine learning for SOH estimation. Furthermore, data preprocessing has received much attention because it is an important step in determining the efficiency of machine learning methods. In this paper, we propose a new preprocessing method for improving the efficiency of machine learning for SOH estimation. The proposed method consists of the relative state of charge (SOC) and data processing, which transforms time-domain data into SOC-domain data. According to the correlation analysis, SOC-domain data are more correlated with the usable capacity than time-domain data. Furthermore, we compare the estimation results of SOC-based data and time-based data in feedforward neural networks (FNNs), convolutional neural networks (CNNs), and long short-term memory (LSTM). The results show that the SOC-based preprocessing outperforms conventional time-domain data-based techniques. Furthermore, the accuracy of the simplest FNN model with the proposed method is higher than that of the CNN model and the LSTM model with a conventional method when training data are small.

Download Full-text

Gradient-Based Inference for Networks with Output Constraints

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014147 ◽

2019 ◽

Vol 33 ◽

pp. 4147-4154 ◽

Cited By ~ 2

Author(s):

Jay Yoon Lee ◽

Sanket Vaibhav Mehta ◽

Michael Wick ◽

Jean-Baptiste Tristan ◽

Jaime Carbonell

Keyword(s):

Neural Networks ◽

Training Data ◽

Test Time ◽

Semantic Role ◽

Syntactic Parsing ◽

Semantic Role Labeling ◽

Post Processing ◽

Output Constraints ◽

Gradient Based ◽

Prediction Problems

Practitioners apply neural networks to increasingly complex problems in natural language processing, such as syntactic parsing and semantic role labeling that have rich output structures. Many such structured-prediction problems require deterministic constraints on the output values; for example, in sequence-to-sequence syntactic parsing, we require that the sequential outputs encode valid trees. While hidden units might capture such properties, the network is not always able to learn such constraints from the training data alone, and practitioners must then resort to post-processing. In this paper, we present an inference method for neural networks that enforces deterministic constraints on outputs without performing rule-based post-processing or expensive discrete search. Instead, in the spirit of gradient-based training, we enforce constraints with gradient-based inference (GBI): for each input at test-time, we nudge continuous model weights until the network’s unconstrained inference procedure generates an output that satisfies the constraints. We study the efficacy of GBI on three tasks with hard constraints: semantic role labeling, syntactic parsing, and sequence transduction. In each case, the algorithm not only satisfies constraints, but improves accuracy, even when the underlying network is stateof-the-art.

Download Full-text

A New Type of Eye Movement Model Based on Recurrent Neural Networks for Simulating the Gaze Behavior of Human Reading

Complexity ◽

10.1155/2019/8641074 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Xiaoming Wang ◽

Xinbo Zhao ◽

Jinchang Ren

Keyword(s):

Neural Networks ◽

Eye Movement ◽

Conditional Random Fields ◽

Short Term Memory ◽

Training Data ◽

Gaze Behavior ◽

Movement Model ◽

Movement Data ◽

Movement Models ◽

New Type

Traditional eye movement models are based on psychological assumptions and empirical data that are not able to simulate eye movement on previously unseen text data. To address this problem, a new type of eye movement model is presented and tested in this paper. In contrast to conventional psychology-based eye movement models, ours is based on a recurrent neural network (RNN) to generate a gaze point prediction sequence, by using the combination of convolutional neural networks (CNN), bidirectional long short-term memory networks (LSTM), and conditional random fields (CRF). The model uses the eye movement data of a reader reading some texts as training data to predict the eye movements of the same reader reading a previously unseen text. A theoretical analysis of the model is presented to show its excellent convergence performance. Experimental results are then presented to demonstrate that the proposed model can achieve similar prediction accuracy while requiring fewer features than current machine learning models.

Download Full-text

Emotion brain-computer interface using wavelet and recurrent neural networks

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v6i1.432 ◽

2020 ◽

Vol 6 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Esmeralda Contessa Djamal ◽

Hamid Fadhilah ◽

Asep Najmurrokhman ◽

Arlisa Wulandari ◽

Faiza Renaldi

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Brain Computer Interface ◽

Computing Time ◽

Extraction Methods ◽

Training Data ◽

Computer Interface ◽

Eeg Signal ◽

Eeg Signals

Brain-Computer Interface (BCI) has an intermediate tool that is usually obtained from EEG signal information. This paper proposed the BCI to control a robot simulator based on three emotions for five seconds by extracting a wavelet function in advance with Recurrent Neural Networks (RNN). Emotion is amongst variables of the brain that can be used to move external devices. BCI's success depends on the ability to recognize one person’s emotions by extracting their EEG signals. One method to appropriately recognize EEG signals as a moving signal is wavelet transformation. Wavelet extracted EEG signal into theta, alpha, and beta wave, and consider them as the input of the RNN technique. Connectivity between sequences is accomplished with Long Short-Term Memory (LSTM). The study also compared frequency extraction methods using Fast Fourier Transform (FFT). The results showed that by extracting EEG signals using Wavelet transformations, we could achieve a confident accuracy of 100% for the training data and 70.54% of new data. While the same RNN configuration without pre-processing provided 39% accuracy, even adding FFT would only increase it to 52%. Furthermore, by using features of the frequency filter, we can increase its accuracy from 70.54% to 79.3%. These results showed the importance of selecting features because of RNNs concern to sequenced its inputs. The use of emotional variables is still relevant for instructions on BCI-based external devices, which provide an average computing time of merely 0.235 seconds.

Download Full-text

Modeling Word Learning and Processing with Recurrent Neural Networks

Information ◽

10.3390/info11060320 ◽

2020 ◽

Vol 11 (6) ◽

pp. 320

Author(s):

Claudia Marzi

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Morphological Structure ◽

Training Data ◽

Self Organizing Map ◽

Verb Forms ◽

Long Short Term Memory ◽

The One ◽

Self Organizing

The paper focuses on what two different types of Recurrent Neural Networks, namely a recurrent Long Short-Term Memory and a recurrent variant of self-organizing memories, a Temporal Self-Organizing Map, can tell us about speakers’ learning and processing a set of fully inflected verb forms selected from the top-frequency paradigms of Italian and German. Both architectures, due to the re-entrant layer of temporal connectivity, can develop a strong sensitivity to sequential patterns that are highly attested in the training data. The main goal is to evaluate learning and processing dynamics of verb inflection data in the two neural networks by focusing on the effects of morphological structure on word production and word recognition, as well as on word generalization for untrained verb forms. For both models, results show that production and recognition, as well as generalization, are facilitated for verb forms in regular paradigms. However, the two models are differently influenced by structural effects, with the Temporal Self-Organizing Map more prone to adaptively find a balance between processing issues of learnability and generalization, on the one side, and discriminability on the other side.

Download Full-text

Operational Determination of the Activated Sludge Process Using Neural Networks

Water Science & Technology ◽

10.2166/wst.1992.0762 ◽

1992 ◽

Vol 26 (9-11) ◽

pp. 2461-2464 ◽

Cited By ~ 2

Author(s):

R. D. Tyagi ◽

Y. G. Du

Keyword(s):

Neural Network ◽

Neural Networks ◽

Steady State ◽

Activated Sludge ◽

Feedforward Neural Network ◽

Training Data ◽

Activated Sludge Process

A steady-statemathematical model of an activated sludgeprocess with a secondary settler was developed. With a limited number of training data samples obtained from the simulation at steady state, a feedforward neural network was established which exhibits an excellent capability for the operational prediction and determination.

Download Full-text

Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement

Biomimetics ◽

10.3390/biomimetics5010001 ◽

2019 ◽

Vol 5 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Michelle Gutiérrez-Muñoz ◽

Astryd González-Salazar ◽

Marvin Coto-Jiménez

Keyword(s):

Neural Networks ◽

Short Term Memory ◽

Computational Cost ◽

Real Life ◽

Fixed Number ◽

Training Procedure ◽

Statistical Validation ◽

Significant Drop ◽

Training Time ◽

Important Solution

Speech signals are degraded in real-life environments, as a product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions. To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combinations of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation was made based on quality measurements of the signal’s spectrum, the training time of the networks, and statistical validation of results. In total, 120 artificial neural networks of eight different types were trained and compared. The results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, given that reduction in training time is on the order of 30%, in processes that can normally take several days or weeks, depending on the amount of data. The results also present advantages in efficiency, but without a significant drop in quality.

Download Full-text