Parsing as Pretraining

Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures – and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and las, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the ptb (93.5%) and end-to-end en-ewt ud (78.8%).

Download Full-text

Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks

10.21437/interspeech.2017-1442 ◽

2017 ◽

Author(s):

Karel Beneš ◽

Murali Karthick Baskar ◽

Lukáš Burget

Keyword(s):

Language Modeling ◽

Feed Forward

Download Full-text

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00115 ◽

2016 ◽

Vol 4 ◽

pp. 521-535 ◽

Cited By ~ 60

Author(s):

Tal Linzen ◽

Emmanuel Dupoux ◽

Yoav Goldberg

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Structural Information ◽

Syntactic Structure ◽

Language Modeling ◽

Language Models ◽

Grammatical Structure ◽

Long Distance ◽

Target Number ◽

Statistical Regularities

The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.

Download Full-text

Independent Language Modeling Architecture for End-To-End ASR

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9054116 ◽

2020 ◽

Author(s):

Van Tung Pham ◽

Haihua Xu ◽

Yerbolat Khassanov ◽

Zhiping Zeng ◽

Eng Siong Chng ◽

...

Keyword(s):

Language Modeling ◽

End To End

Download Full-text

Exploiting syntactic structure of queries in a language modeling approach to IR

Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03 ◽

10.1145/956863.956952 ◽

2003 ◽

Cited By ~ 6

Author(s):

Munirathnam Srikanth ◽

Rohini Srihari

Keyword(s):

Syntactic Structure ◽

Language Modeling ◽

Modeling Approach

Download Full-text

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

10.18653/v1/2020.acl-main.591 ◽

2020 ◽

Author(s):

Wenyu Du ◽

Zhouhan Lin ◽

Yikang Shen ◽

Timothy J. O’Donnell ◽

Yoshua Bengio ◽

...

Keyword(s):

Syntactic Structure ◽

Language Modeling ◽

Distance Approach

Download Full-text

DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers

10.36227/techrxiv.16444611.v1 ◽

2021 ◽

Author(s):

Abdul Wahab ◽

Rafet Sifa

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Semantic Similarity ◽

Syntactic Structure ◽

Language Modeling ◽

Benchmark Dataset ◽

Fine Tuning ◽

New Model ◽

Dependency Tree ◽

Better Than

<div> <div> <div> <p> </p><div> <div> <div> <p>In this paper, we propose a new model named DIBERT which stands for Dependency Injected Bidirectional Encoder Representations from Transformers. DIBERT is a variation of the BERT and has an additional third objective called Parent Prediction (PP) apart from Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). PP injects the syntactic structure of a dependency tree while pre-training the DIBERT which generates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both BERT- Base and DIBERT. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various downstream tasks including Semantic Similarity, Natural Language Inference and Sentiment Analysis. </p> </div> </div> </div> </div> </div> </div>

Download Full-text

DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers

10.36227/techrxiv.16444611.v2 ◽

2021 ◽

Author(s):

Abdul Wahab ◽

Rafet Sifa

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Semantic Similarity ◽

Syntactic Structure ◽

Language Modeling ◽

Benchmark Dataset ◽

Fine Tuning ◽

New Model ◽

Dependency Tree ◽

Better Than

Download Full-text

i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models

10.21437/interspeech.2018-1070 ◽

2018 ◽

Cited By ~ 2

Author(s):

Karel Beneš ◽

Santosh Kesiraju ◽

Lukáš Burget

Keyword(s):

Domain Adaptation ◽

Language Modeling ◽

Forward Models ◽

Feed Forward

Download Full-text

Dependency Parsing with your Eyes: Dependency Structure Predicts Eye Regressions During Reading

10.31234/osf.io/kusxb ◽

2019 ◽

Cited By ~ 2

Author(s):

Alessandro Lopopolo ◽

Stefan L. Frank ◽

Antal van den Bosch ◽

Roel M. Willems

Keyword(s):

Eye Tracking ◽

Syntactic Structure ◽

Syntactic Analysis ◽

Dependency Parsing ◽

Text Difficulty ◽

Narrative Texts ◽

Dependency Structure ◽

Structural Reanalysis ◽

Dependency Parser ◽

Short Narrative

Backward saccades during reading have been hypothesized to be involved in structural reanalysis, or to be related to the level of text difficulty. We test the hypothesis that backward saccades are involved in online syntactic analysis. If this is the case we expect that saccades will coincide, at least partially, with the edges of the relations computed by a dependency parser. In order to test this, we analyzed a large eye-tracking dataset collected while 102 participants read three short narrative texts. Our results show a relation between backward saccades and the syntactic structure of sentences.

Download Full-text

Exploiting syntactic structure for language modeling

10.3115/980451.980882 ◽

1998 ◽

Cited By ~ 6

Author(s):

Ciprian Chelba ◽

Frederick Jelinek

Keyword(s):

Syntactic Structure ◽

Language Modeling

Download Full-text