scholarly journals Parsing as Pretraining

2020 ◽  
Vol 34 (05) ◽  
pp. 9114-9121
Author(s):  
David Vilares ◽  
Michalina Strzyz ◽  
Anders Søgaard ◽  
Carlos Gómez-Rodríguez

Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures – and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and las, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the ptb (93.5%) and end-to-end en-ewt ud (78.8%).

Author(s):  
Tal Linzen ◽  
Emmanuel Dupoux ◽  
Yoav Goldberg

The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.


Author(s):  
Van Tung Pham ◽  
Haihua Xu ◽  
Yerbolat Khassanov ◽  
Zhiping Zeng ◽  
Eng Siong Chng ◽  
...  
Keyword(s):  

2020 ◽  
Author(s):  
Wenyu Du ◽  
Zhouhan Lin ◽  
Yikang Shen ◽  
Timothy J. O’Donnell ◽  
Yoshua Bengio ◽  
...  

2021 ◽  
Author(s):  
Abdul Wahab ◽  
Rafet Sifa

<div> <div> <div> <p> </p><div> <div> <div> <p>In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree while pre-training the DIBERT which generates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various downstream tasks including Semantic Similarity, Natural Language Inference and Sentiment Analysis. </p> </div> </div> </div> </div> </div> </div>


2021 ◽  
Author(s):  
Abdul Wahab ◽  
Rafet Sifa

<div> <div> <div> <p> </p><div> <div> <div> <p>In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree while pre-training the DIBERT which generates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various downstream tasks including Semantic Similarity, Natural Language Inference and Sentiment Analysis. </p> </div> </div> </div> </div> </div> </div>


2019 ◽  
Author(s):  
Alessandro Lopopolo ◽  
Stefan L. Frank ◽  
Antal van den Bosch ◽  
Roel M. Willems

Backward saccades during reading have been hypothesized to be involved in structural reanalysis, or to be related to the level of text difficulty. We test the hypothesis that backward saccades are involved in online syntactic analysis. If this is the case we expect that saccades will coincide, at least partially, with the edges of the relations computed by a dependency parser. In order to test this, we analyzed a large eye-tracking dataset collected while 102 participants read three short narrative texts. Our results show a relation between backward saccades and the syntactic structure of sentences.


Sign in / Sign up

Export Citation Format

Share Document