scholarly journals Split-Based Algorithm for Weighted Context-Free Grammar Induction

2021 ◽  
Vol 11 (3) ◽  
pp. 1030
Author(s):  
Mateusz Gabor ◽  
Wojciech Wieczorek ◽  
Olgierd Unold

The split-based method in a weighted context-free grammar (WCFG) induction was formalised and verified on a comprehensive set of context-free languages. WCFG is learned using a novel grammatical inference method. The proposed method learns WCFG from both positive and negative samples, whereas the weights of rules are estimated using a novel Inside–Outside Contrastive Estimation algorithm. The results showed that our approach outperforms in terms of F1 scores of other state-of-the-art methods.

2013 ◽  
Vol 39 (1) ◽  
pp. 57-85 ◽  
Author(s):  
Alexander Fraser ◽  
Helmut Schmid ◽  
Richárd Farkas ◽  
Renjing Wang ◽  
Hinrich Schütze

We study constituent parsing of German, a morphologically rich and less-configurational language. We use a probabilistic context-free grammar treebank grammar that has been adapted to the morphologically rich properties of German by markovization and special features added to its productions. We evaluate the impact of adding lexical knowledge. Then we examine both monolingual and bilingual approaches to parse reranking. Our reranking parser is the new state of the art in constituency parsing of the TIGER Treebank. We perform an analysis, concluding with lessons learned, which apply to parsing other morphologically rich and less-configurational languages.


2016 ◽  
Vol 41 (4) ◽  
pp. 297-315
Author(s):  
Wojciech Wieczorek

Abstract A cover-grammar of a finite language is a context-free grammar that accepts all words in the language and possibly other words that are longer than any word in the language. In this paper, we describe an efficient algorithm aided by Ant Colony System that, for a given finite language, synthesizes (constructs) a small cover-grammar of the language. We also check its ability to solve a grammatical inference task through the series of experiments.


2020 ◽  
Vol 8 ◽  
pp. 647-661
Author(s):  
Hao Zhu ◽  
Yonatan Bisk ◽  
Graham Neubig

In this paper we demonstrate that context free grammar (CFG) based methods for grammar induction benefit from modeling lexical dependencies. This contrasts to the most popular current methods for grammar induction, which focus on discovering either constituents or dependencies. Previous approaches to marry these two disparate syntactic formalisms (e.g., lexicalized PCFGs) have been plagued by sparsity, making them unsuitable for unsupervised grammar induction. However, in this work, we present novel neural models of lexicalized PCFGs that allow us to overcome sparsity problems and effectively induce both constituents and dependencies within a single model. Experiments demonstrate that this unified framework results in stronger results on both representations than achieved when modeling either formalism alone. 1


2007 ◽  
Vol 33 (2) ◽  
pp. 201-228 ◽  
Author(s):  
David Chiang

We present a statistical machine translation model that uses hierarchical phrases—phrases that contain subphrases. The model is formally a synchronous context-free grammar but is learned from a parallel text without any syntactic annotations. Thus it can be seen as combining fundamental ideas from both syntax-based translation and phrase-based translation. We describe our system's training and decoding methods in detail, and evaluate it for translation speed and translation accuracy. Using BLEU as a metric of translation accuracy, we find that our system performs significantly better than the Alignment Template System, a state-of-the-art phrase-based system.


2015 ◽  
Vol 22 (5) ◽  
pp. 775-810 ◽  
Author(s):  
QAISER ABBAS

AbstractThis work presents the development and evaluation of an extended Urdu parser. It further focuses on issues related to this parser and describes the changes made in the Earley algorithm to get accurate and relevant results from the Urdu parser. The parser makes use of a morphologically rich context free grammar extracted from a linguistically-rich Urdu treebank. This grammar with sufficient encoded information is comparable with the state-of-the-art parsing requirements for the morphologically rich Urdu language. The extended parsing model and the linguistically rich extracted-grammar both provide us better evaluation results in Urdu/Hindi parsing domain. The parser gives 87% of f-score, which outperforms the existing parsing work of Urdu/Hindi based on the tree-banking approach.


Sign in / Sign up

Export Citation Format

Share Document