scholarly journals On Learning Interpreted Languages with Recurrent Models

2022 ◽  
pp. 1-13
Author(s):  
Denis Paperno

Abstract Can recurrent neural nets, inspired by human sequential data processing, learn to understand language? We construct simplified datasets reflecting core properties of natural language as modeled in formal syntax and semantics: recursive syntactic structure and compositionality. We find LSTM and GRU networks to generalise to compositional interpretation well, but only in the most favorable learning settings, with a well-paced curriculum, extensive training data, and left-to-right (but not right-to-left) composition.

1990 ◽  
Vol 6 (1-2) ◽  
pp. 1-2
Author(s):  
Ben Zion Barta ◽  
Lauri Fontell ◽  
Patrick Raymont

2007 ◽  
Vol 33 (1) ◽  
pp. 105-133 ◽  
Author(s):  
Catalina Hallett ◽  
Donia Scott ◽  
Richard Power

This article describes a method for composing fluent and complex natural language questions, while avoiding the standard pitfalls of free text queries. The method, based on Conceptual Authoring, is targeted at question-answering systems where reliability and transparency are critical, and where users cannot be expected to undergo extensive training in question composition. This scenario is found in most corporate domains, especially in applications that are risk-averse. We present a proof-of-concept system we have developed: a question-answering interface to a large repository of medical histories in the area of cancer. We show that the method allows users to successfully and reliably compose complex queries with minimal training.


2020 ◽  
Vol 34 (05) ◽  
pp. 7554-7561
Author(s):  
Pengxiang Cheng ◽  
Katrin Erk

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.


2020 ◽  
Vol 34 (05) ◽  
pp. 8504-8511
Author(s):  
Arindam Mitra ◽  
Ishan Shrivastava ◽  
Chitta Baral

Natural Language Inference (NLI) plays an important role in many natural language processing tasks such as question answering. However, existing NLI modules that are trained on existing NLI datasets have several drawbacks. For example, they do not capture the notion of entity and role well and often end up making mistakes such as “Peter signed a deal” can be inferred from “John signed a deal”. As part of this work, we have developed two datasets that help mitigate such issues and make the systems better at understanding the notion of “entities” and “roles”. After training the existing models on the new dataset we observe that the existing models do not perform well on one of the new benchmark. We then propose a modification to the “word-to-word” attention function which has been uniformly reused across several popular NLI architectures. The resulting models perform as well as their unmodified counterparts on the existing benchmarks and perform significantly well on the new benchmarks that emphasize “roles” and “entities”.


2021 ◽  
Author(s):  
Wilson Wongso ◽  
Henry Lucky ◽  
Derwin Suhartono

Abstract The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use.


2021 ◽  
Author(s):  
Lianteng Song ◽  
◽  
Zhonghua Liu ◽  
Chaoliu Li ◽  
Congqian Ning ◽  
...  

Geomechanical properties are essential for safe drilling, successful completion, and exploration of both conven-tional and unconventional reservoirs, e.g. deep shale gas and shale oil. Typically, these properties could be calcu-lated from sonic logs. However, in shale reservoirs, it is time-consuming and challenging to obtain reliable log-ging data due to borehole complexity and lacking of in-formation, which often results in log deficiency and high recovery cost of incomplete datasets. In this work, we propose the bidirectional long short-term memory (BiL-STM) which is a supervised neural network algorithm that has been widely used in sequential data-based pre-diction to estimate geomechanical parameters. The pre-diction from log data can be conducted from two differ-ent aspects. 1) Single-Well prediction, the log data from a single well is divided into training data and testing data for cross validation; 2) Cross-Well prediction, a group of wells from the same geographical region are divided into training set and testing set for cross validation, as well. The logs used in this work were collected from 11 wells from Jimusaer Shale, which includes gamma ray, bulk density, resistivity, and etc. We employed 5 vari-ous machine learning algorithms for comparison, among which BiLSTM showed the best performance with an R-squared of more than 90% and an RMSE of less than 10. The predicted results can be directly used to calcu-late geomechanical properties, of which accuracy is also improved in contrast to conventional methods.


2019 ◽  
Vol 5 (1) ◽  
Author(s):  
Jens Nevens ◽  
Paul Van Eecke ◽  
Katrien Beuls

AbstractIn order to be able to answer a natural language question, a computational system needs three main capabilities. First, the system needs to be able to analyze the question into a structured query, revealing its component parts and how these are combined. Second, it needs to have access to relevant knowledge sources, such as databases, texts or images. Third, it needs to be able to execute the query on these knowledge sources. This paper focuses on the first capability, presenting a novel approach to semantically parsing questions expressed in natural language. The method makes use of a computational construction grammar model for mapping questions onto their executable semantic representations. We demonstrate and evaluate the methodology on the CLEVR visual question answering benchmark task. Our system achieves a 100% accuracy, effectively solving the language understanding part of the benchmark task. Additionally, we demonstrate how this solution can be embedded in a full visual question answering system, in which a question is answered by executing its semantic representation on an image. The main advantages of the approach include (i) its transparent and interpretable properties, (ii) its extensibility, and (iii) the fact that the method does not rely on any annotated training data.


2015 ◽  
Vol 3 ◽  
pp. 461-473 ◽  
Author(s):  
Daniel Beck ◽  
Trevor Cohn ◽  
Christian Hardmeier ◽  
Lucia Specia

Structural kernels are a flexible learning paradigm that has been widely used in Natural Language Processing. However, the problem of model selection in kernel-based methods is usually overlooked. Previous approaches mostly rely on setting default values for kernel hyperparameters or using grid search, which is slow and coarse-grained. In contrast, Bayesian methods allow efficient model selection by maximizing the evidence on the training data through gradient-based methods. In this paper we show how to perform this in the context of structural kernels by using Gaussian Processes. Experimental results on tree kernels show that this procedure results in better prediction performance compared to hyperparameter optimization via grid search. The framework proposed in this paper can be adapted to other structures besides trees, e.g., strings and graphs, thereby extending the utility of kernel-based methods.


1979 ◽  
Vol 15 (1) ◽  
pp. 39-47 ◽  
Author(s):  
Geoffrey Sampson

Many contemporary linguists hold that an adequate description of a natural language must represent many of its vocabulary items as syntactically and/or semantically complex. A sentence containing the word kill, for instance, will on this view be assigned a ‘deep syntactic structure’ or ‘semantic representation’ in which kill is represented by a portion or portions of tree-structure, the lowest nodes of which are labelled with ‘semantic primitives’ such as CAUSE and DIE, or CAUSE, BECOME, NOT and ALIVE. In the case of words such as cats or walked, which are formed in accordance with productive rules of ‘inflexional’ rather than ‘derivational’ morphology, there is little dispute that their composite status will be reflected at most or all levels of linguistic representation. (That is why I refer, above, to ‘vocabulary items’: cat and cats may be called different ‘words’, but not different elements of the English vocbulary.) When morphologically simple words such as kill are treated as composite at a ‘deeper’ level, I, for one, find my credulity strained to breaking point. (The case of words formed in accordance with productive or non-productive rules of derivational morphology, such as killer or kingly, is an intermediate one and I shall briefly return to it below.)


Sign in / Sign up

Export Citation Format

Share Document