Attention-Based Joint Entity Linking with Entity Embedding

Entity linking (also called entity disambiguation) aims to map the mentions in a given document to their corresponding entities in a target knowledge base. In order to build a high-quality entity linking system, efforts are made in three parts: Encoding of the entity, encoding of the mention context, and modeling the coherence among mentions. For the encoding of entity, we use long short term memory (LSTM) and a convolutional neural network (CNN) to encode the entity context and entity description, respectively. Then, we design a function to combine all the different entity information aspects, in order to generate unified, dense entity embeddings. For the encoding of mention context, unlike standard attention mechanisms which can only capture important individual words, we introduce a novel, attention mechanism-based LSTM model, which can effectively capture the important text spans around a given mention with a conditional random field (CRF) layer. In addition, we take the coherence among mentions into consideration with a Forward-Backward Algorithm, which is less time-consuming than previous methods. Our experimental results show that our model obtains a competitive, or even better, performance than state-of-the-art models across different datasets.

Download Full-text

A Two-Layer Long Short-Term Memory Network for Bottleneck Prediction in Multi-Job Manufacturing Systems

Volume 3: Manufacturing Equipment and Systems ◽

10.1115/msec2018-6678 ◽

2018 ◽

Cited By ~ 1

Author(s):

Xingjian Lai ◽

Huanyi Shui ◽

Jun Ni

Keyword(s):

Manufacturing Systems ◽

Short Term Memory ◽

Complex Dynamics ◽

State Of The Art ◽

Short Term ◽

Term Memory ◽

Future Production ◽

Effective Manner ◽

Long Short Term Memory ◽

Factory Floor

Throughput bottlenecks define and constrain the productivity of a production line. Prediction of future bottlenecks provides a great support for decision-making on the factory floor, which can help to foresee and formulate appropriate actions before production to improve the system throughput in a cost-effective manner. Bottleneck prediction remains a challenging task in literature. The difficulty lies in the complex dynamics of manufacturing systems. There are multiple factors collaboratively affecting bottleneck conditions, such as machine performance, machine degradation, line structure, operator skill level, and product release schedules. These factors impact on one another in a nonlinear manner and exhibit long-term temporal dependencies. State-of-the-art research utilizes various assumptions to simplify the modeling by reducing the input dimensionality. As a result, those models cannot accurately reflect complex dynamics of the bottleneck in a manufacturing system. To tackle this problem, this paper will propose a systematic framework to design a two-layer Long Short-Term Memory (LSTM) network tailored to the dynamic bottleneck prediction problem in multi-job manufacturing systems. This neural network based approach takes advantage of historical high dimensional factory floor data to predict system bottlenecks dynamically considering the future production planning inputs. The model is demonstrated with data from an automotive underbody assembly line. The result shows that the proposed method can achieve higher prediction accuracy compared with current state-of-the-art approaches.

Download Full-text

Arabic dialect sentiment analysis with ZERO effort. \\ Case study: Algerian dialect

INTELIGENCIA ARTIFICIAL ◽

10.4114/intartif.vol23iss65pp124-135 ◽

2020 ◽

Vol 23 (65) ◽

pp. 124-135

Author(s):

Imane Guellil ◽

Marcelo Mendoza ◽

Faical Azouaou

Keyword(s):

Sentiment Analysis ◽

Short Term Memory ◽

State Of The Art ◽

Short Term ◽

Term Memory ◽

Ongoing Work ◽

Long Short Term Memory ◽

Large Corpus ◽

Unique Condition

This paper presents an analytic study showing that it is entirely possible to analyze the sentiment of an Arabic dialect without constructing any resources. The idea of this work is to use the resources dedicated to a given dialect \textit{X} for analyzing the sentiment of another dialect \textit{Y}. The unique condition is to have \textit{X} and \textit{Y} in the same category of dialects. We apply this idea on Algerian dialect, which is a Maghrebi Arabic dialect that suffers from limited available tools and other handling resources required for automatic sentiment analysis. To do this analysis, we rely on Maghrebi dialect resources and two manually annotated sentiment corpus for respectively Tunisian and Moroccan dialect. We also use a large corpus for Maghrebi dialect. We use a state-of-the-art system and propose a new deep learning architecture for automatically classify the sentiment of Arabic dialect (Algerian dialect). Experimental results show that F1-score is up to 83% and it is achieved by Multilayer Perceptron (MLP) with Tunisian corpus and with Long short-term memory (LSTM) with the combination of Tunisian and Moroccan. An improvement of 15% compared to its closest competitor was observed through this study. Ongoing work is aimed at manually constructing an annotated sentiment corpus for Algerian dialect and comparing the results

Download Full-text

An Optimized Abstractive Text Summarization Model Using Peephole Convolutional LSTM

Symmetry ◽

10.3390/sym11101290 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1290 ◽

Cited By ~ 2

Author(s):

Rahman ◽

Siddiqui

Keyword(s):

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Text Summarization ◽

Short Term ◽

Term Memory ◽

Semantic Coherence ◽

Long Short Term Memory ◽

Central Composite ◽

Convolutional Lstm

Abstractive text summarization that generates a summary by paraphrasing a long text remains an open significant problem for natural language processing. In this paper, we present an abstractive text summarization model, multi-layered attentional peephole convolutional LSTM (long short-term memory) (MAPCoL) that automatically generates a summary from a long text. We optimize parameters of MAPCoL using central composite design (CCD) in combination with the response surface methodology (RSM), which gives the highest accuracy in terms of summary generation. We record the accuracy of our model (MAPCoL) on a CNN/DailyMail dataset. We perform a comparative analysis of the accuracy of MAPCoL with that of the state-of-the-art models in different experimental settings. The MAPCoL also outperforms the traditional LSTM-based models in respect of semantic coherence in the output summary.

Download Full-text

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/132 ◽

2019 ◽

Cited By ~ 7

Author(s):

Jing Wang ◽

Yingwei Pan ◽

Ting Yao ◽

Jinhui Tang ◽

Tao Mei

Keyword(s):

Coherent Structure ◽

Topic Modeling ◽

Short Term Memory ◽

State Of The Art ◽

Attention Mechanism ◽

Visual Content ◽

Short Term ◽

Term Memory ◽

Sentence Level ◽

Long Short Term Memory

Image paragraph generation is the task of producing a coherent story (usually a paragraph) that describes the visual content of an image. The problem nevertheless is not trivial especially when there are multiple descriptive and diverse gists to be considered for paragraph generation, which often happens in real images. A valid question is how to encapsulate such gists/topics that are worthy of mention from an image, and then describe the image from one topic to another but holistically with a coherent structure. In this paper, we present a new design --- Convolutional Auto-Encoding (CAE) that purely employs convolutional and deconvolutional auto-encoding framework for topic modeling on the region-level features of an image. Furthermore, we propose an architecture, namely CAE plus Long Short-Term Memory (dubbed as CAE-LSTM), that novelly integrates the learnt topics in support of paragraph generation. Technically, CAE-LSTM capitalizes on a two-level LSTM-based paragraph generation framework with attention mechanism. The paragraph-level LSTM captures the inter-sentence dependency in a paragraph, while sentence-level LSTM is to generate one sentence which is conditioned on each learnt topic. Extensive experiments are conducted on Stanford image paragraph dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, CAE-LSTM increases CIDEr performance from 20.93% to 25.15%.

Download Full-text

Pengenalan Entitas Bernama Otomatis untuk Bahasa Indonesia dengan Pendekatan Pembelajaran Mesin

10.31227/osf.io/vud2p ◽

2018 ◽

Cited By ~ 2

Author(s):

Yudi Wibisono ◽

Masayu Leylia Khodra

Keyword(s):

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory ◽

F Measure ◽

Bahasa Indonesia

Pengenalan entitas bernama (named-entity recognition atau NER) adalah proses otomatis mengekstraksi entitas bernama yang dianggap penting di dalam sebuah teks dan menentukan kategorinya ke dalam kategori terdefinisi. Sebagai contoh, untuk teks berita, NER dapat mengekstraksi nama orang, nama organisasi, dan nama lokasi. NER bermanfaat dalam berbagai aplikasi analisis teks, misalnya pencarian, sistem tanya jawab, peringkasan teks dan mesin penerjemah. Tantangan utama NER adalah penanganan ambiguitas makna karena konteks kata pada kalimat, misalnya kata “Cendana” dapat merupakan nama lokasi (Jalan Cendana), atau nama organisasi (Keluarga Cendana), atau nama tanaman. Tantangan lainnya adalah penentuan batas entitas, misalnya “[Istora Senayan] [Jakarta]”. Berbagai kakas NER telah dikembangkan untuk berbagai bahasa terutama Bahasa Inggris dengan kinerja yang baik, tetapi kakas NER bahasa Indonesia masih memiliki kinerja yang belum baik. Makalah ini membahas pendekatan berbasis pembelajaran mesin untuk menghasilkan model NER bahasa Indonesia. Pendekatan ini sangat bergantung pada korpus yang menjadi sumber belajar, dan teknik pembelajaran mesin yang digunakan. Teknik yang akan digunakan adalah LSTM - CRF (Long Short Term Memory – Conditional Random Field). Hasil terbaik (F-measure = 0.72) didapatkan dengan menggunakan word embedding GloVe Wikipedia Bahasa Indonesia.

Download Full-text

Groundwater level forecasting with artificial neural networks: a comparison of long short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with exogenous input (NARX)

Hydrology and Earth System Sciences ◽

10.5194/hess-25-1671-2021 ◽

2021 ◽

Vol 25 (3) ◽

pp. 1671-1687

Author(s):

Andreas Wunsch ◽

Tanja Liesch ◽

Stefan Broda

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Convolutional Neural Networks ◽

Groundwater Level ◽

Short Term Memory ◽

State Of The Art ◽

Short Term ◽

Term Memory ◽

Non Linear ◽

Long Short Term Memory

Abstract. It is now well established to use shallow artificial neural networks (ANNs) to obtain accurate and reliable groundwater level forecasts, which are an important tool for sustainable groundwater management. However, we observe an increasing shift from conventional shallow ANNs to state-of-the-art deep-learning (DL) techniques, but a direct comparison of the performance is often lacking. Although they have already clearly proven their suitability, shallow recurrent networks frequently seem to be excluded from the study design due to the euphoria about new DL techniques and its successes in various disciplines. Therefore, we aim to provide an overview on the predictive ability in terms of groundwater levels of shallow conventional recurrent ANNs, namely non-linear autoregressive networks with exogenous input (NARX) and popular state-of-the-art DL techniques such as long short-term memory (LSTM) and convolutional neural networks (CNNs). We compare the performance on both sequence-to-value (seq2val) and sequence-to-sequence (seq2seq) forecasting on a 4-year period while using only few, widely available and easy to measure meteorological input parameters, which makes our approach widely applicable. Further, we also investigate the data dependency in terms of time series length of the different ANN architectures. For seq2val forecasts, NARX models on average perform best; however, CNNs are much faster and only slightly worse in terms of accuracy. For seq2seq forecasts, mostly NARX outperform both DL models and even almost reach the speed of CNNs. However, NARX are the least robust against initialization effects, which nevertheless can be handled easily using ensemble forecasting. We showed that shallow neural networks, such as NARX, should not be neglected in comparison to DL techniques especially when only small amounts of training data are available, where they can clearly outperform LSTMs and CNNs; however, LSTMs and CNNs might perform substantially better with a larger dataset, where DL really can demonstrate its strengths, which is rarely available in the groundwater domain though.

Download Full-text

An Approach to NMT Re-Ranking Using Sequence-Labeling for Grammatical Error Correction

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2020.p0557 ◽

2020 ◽

Vol 24 (4) ◽

pp. 557-567

Author(s):

Bo Wang ◽

◽

Kaoru Hirota ◽

Chang Liu ◽

Yaping Dai ◽

...

Keyword(s):

Error Correction ◽

Short Term Memory ◽

Conditional Random Field ◽

Short Term ◽

Neural Machine Translation ◽

Term Memory ◽

Sequence Labeling ◽

Grammatical Error ◽

Small Set ◽

Long Short Term Memory

An approach to N-best hypotheses re-ranking using a sequence-labeling model is applied to resolve the data deficiency problem in Grammatical Error Correction (GEC). Multiple candidate sentences are generated using a Neural Machine Translation (NMT) model; thereafter, these sentences are re-ranked via a stacked Transformer following a Bidirectional Long Short-Term Memory (BiLSTM) with Conditional Random Field (CRF). Correlations within the sentences are extracted using the sequence-labeling model based on the Transformer, which is particularly suitable for long sentences. Meanwhile, the knowledge from a large amount of unlabeled data is acquired through the pre-trained structure. Thus, completely revised sentences are adopted instead of partially modified sentences. Compared with conventional NMT, experiments on the NUCLE and FCE datasets demonstrate that the model improves the F0.5 score by 8.22% and 2.09%, respectively. As an advantage, the proposed re-ranking method has the advantage of only requires a small set of easily computed features that do not need linguistic inputs.

Download Full-text

Post Text Processing of Chinese Speech Recognition Based on Bidirectional LSTM Networks and CRF

Electronics ◽

10.3390/electronics8111248 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1248 ◽

Cited By ~ 3

Author(s):

Li Yang ◽

Ying Li ◽

Jin Wang ◽

Zhuo Tang

Keyword(s):

Speech Recognition ◽

Error Detection ◽

Short Term Memory ◽

Text Processing ◽

Conditional Random Field ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Lstm Network ◽

Two Stages

With the rapid development of Internet of Things Technology, speech recognition has been applied more and more widely. Chinese Speech Recognition is a complex process. In the process of speech-to-text conversion, due to the influence of dialect, environmental noise, and context, the accuracy of speech-to-text in multi-round dialogues and specific contexts is still not high. After the general speech recognition technology, the text after speech recognition can be detected and corrected in the specific context, which is helpful to improve the robustness of text comprehension and is a beneficial supplement to the speech recognition technology. In this paper, a text processing model after Chinese Speech Recognition is proposed, which combines a bidirectional long short-term memory (LSTM) network with a conditional random field (CRF) model. The task is divided into two stages: text error detection and text error correction. In this paper, a bidirectional long short-term memory (Bi-LSTM) network and conditional random field are used in two stages of text error detection and text error correction respectively. Through verification and system test on the SIGHAN 2013 Chinese Spelling Check (CSC) dataset, the experimental results show that the model can effectively improve the accuracy of text after speech recognition.

Download Full-text

Text segmentation for patent claim simplification via Bidirectional Long‐Short Term Memory and Conditional Random Field

Computational Intelligence ◽

10.1111/coin.12455 ◽

2021 ◽

Author(s):

Boting Geng

Keyword(s):

Random Field ◽

Short Term Memory ◽

Conditional Random Field ◽

Text Segmentation ◽

Patent Claim ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Malicious Traffic classification using Long Short-Term Memory (LSTM) model

10.21203/rs.3.rs-159180/v1 ◽

2021 ◽

Author(s):

Naresh Kumar Thapa K ◽

N. Duraipandian

Keyword(s):

Short Term Memory ◽

State Of The Art ◽

Detection System ◽

Classification Systems ◽

Traffic Classification ◽

Short Term ◽

Fixed Sequence ◽

Term Memory ◽

Proposed Model ◽

Long Short Term Memory

Abstract Malicious traffic classification is the initial and primary step for any network-based security systems. This traffic classification systems include behavior-based anomaly detection system and Intrusion Detection System. Existing methods always relies on the conventional techniques and process the data in the fixed sequence, which may leads to performance issues. Furthermore, conventional techniques require proper annotation to process the volumetric data. Relying on the data annotation for efficient traffic classification may leads to network loops and bandwidth issues within the network. To address the above-mentioned issues, this paper presents a novel solution based on artificial intelligence perspective. The key idea of this paper is to propose a novel malicious classification system using Long Short-Term Memory (LSTM) model. To validate the efficiency of the proposed model, an experimental setup along with experimental validation is carried out. From the experimental results, it is proven that the proposed model is better in terms of accuracy, throughput when compared to the state-of-the-art models. Further, the accuracy of the proposed model outperforms the existing state of the art models with increase in 5% and overall 99.5% in accuracy.

Download Full-text