A derivative-based parser generator for visibly Pushdown grammars

In this paper, we present a derivative-based, functional recognizer and parser generator for visibly pushdown grammars. The generated parser accepts ambiguous grammars and produces a parse forest containing all valid parse trees for an input string in linear time. Each parse tree in the forest can then be extracted also in linear time. Besides the parser generator, to allow more flexible forms of the visibly pushdown grammars, we also present a translator that converts a tagged CFG to a visibly pushdown grammar in a sound way, and the parse trees of the tagged CFG are further produced by running the semantic actions embedded in the parse trees of the translated visibly pushdown grammar. The performance of the parser is compared with a popular parsing tool ANTLR and other popular hand-crafted parsers. The correctness of the core parsing algorithm is formally verified in the proof assistant Coq.

Download Full-text

Global Greedy Dependency Parsing

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6348 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8319-8326

Author(s):

Zuchao Li ◽

Hai Zhao ◽

Kevin Parnow

Keyword(s):

Feature Extraction ◽

Polynomial Time ◽

Time Complexity ◽

Linear Time ◽

Parse Tree ◽

Dependency Parsing ◽

Graph Models ◽

Parse Trees ◽

Left And Right ◽

Syntactic Dependency

Most syntactic dependency parsing models may fall into one of two categories: transition- and graph-based models. The former models enjoy high inference efficiency with linear time complexity, but they rely on the stacking or re-ranking of partially-built parse trees to build a complete parse tree and are stuck with slower training for the necessity of dynamic oracle training. The latter, graph-based models, may boast better performance but are unfortunately marred by polynomial time inference. In this paper, we propose a novel parsing order objective, resulting in a novel dependency parsing model capable of both global (in sentence scope) feature extraction as in graph models and linear time inference as in transitional models. The proposed global greedy parser only uses two arc-building actions, left and right arcs, for projective parsing. When equipped with two extra non-projective arc-building actions, the proposed parser may also smoothly support non-projective parsing. Using multiple benchmark treebanks, including the Penn Treebank (PTB), the CoNLL-X treebanks, and the Universal Dependency Treebanks, we evaluate our parser and demonstrate that the proposed novel parser achieves good performance with faster training and decoding.

Download Full-text

Parallel Hardware Stochastic Context-Free Parsers

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001416500087 ◽

2016 ◽

Vol 30 (04) ◽

pp. 1650008 ◽

Cited By ~ 3

Author(s):

Christos Pavlatos ◽

Alexandros C. Dimopoulos ◽

George Papakonstantinou

Keyword(s):

Performance Comparison ◽

Input String ◽

Parse Tree ◽

Gate Arrays ◽

Tree Traversal ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Hardware Description ◽

Parse Trees ◽

Context Free

In this paper a platform is presented, that given a stochastic context-free grammar (SCFG), automatically outputs the description of the parser in synthesizable hardware description language (HDL) which can be downloaded in an Field Programmable Gate Arrays (FPGA) board. Initially, according to our methodology the SCFG is augmented with attributes which store the probability values and can be evaluated through corresponding stack actions. The architecture of the produced system is based on a proposed extension of Earley’s parallel algorithm, which given an input string, generates the parse trees in the form of an AND-Or parse tree. This AND-or parse tree is then traversed using a proposed tree traversal technique in order to execute the corresponding actions in the correct order, so as to compute the necessary probabilities. The platform is suitable for embedded systems applications where a natural language interface is required or in pattern recognition tasks. The parser generated by the presented platform has been tested for various SCFGs and compared to software approaches. The performance comparison is one to two orders of magnitude in favor of the presented hardware, compared to previous software approaches, depending on the application, the input string length and the number of produced trees.

Download Full-text

Analyzing Holistic Parsers: Implications for Robust Parsing and Systematicity

Neural Computation ◽

10.1162/08997660151134361 ◽

2001 ◽

Vol 13 (5) ◽

pp. 1137-1170 ◽

Cited By ~ 1

Author(s):

Edward Kei Shiu Ho ◽

Lai Wan Chan

Keyword(s):

Parse Tree ◽

State Transitions ◽

Generalization Performance ◽

Original Sentence ◽

Representational Space ◽

Grammatical Errors ◽

Robust Parsing ◽

Input Sentence ◽

Parse Trees ◽

Tree Representations

Holistic parsers offer a viable alternative to traditional algorithmic parsers. They have good generalization performance and are robust inherently. In a holistic parser, parsing is achieved by mapping the connectionist representation of the input sentence to the connectionist representation of the target parse tree directly. Little prior knowledge of the underlying parsing mechanism thus needs to be assumed. However, it also makes holistic parsing difficult to understand. In this article, an analysis is presented for studying the operations of the confluent pre-order parser (CPP). In the analysis, the CPP is viewed as a dynamical system, and holistic parsing is perceived as a sequence of state transitions through its state-space. The seemingly one-shot parsing mechanism can thus be elucidated as a step-by-step inference process, with the intermediate parsing decisions being reflected by the states visited during parsing. The study serves two purposes. First, it improves our understanding of how grammatical errors are corrected by the CPP. The occurrence of an error in a sentence will cause the CPP to deviate from the normal track that is followed when the original sentence is parsed. But as the remaining terminals are read, the two trajectories will gradually converge until finally the correct parse tree is produced. Second, it reveals that having systematic parse tree representations alone cannot guarantee good generalization performance in holistic parsing. More important, they need to be distributed in certain useful locations of the representational space. Sentences with similar trailing terminals should have their corresponding parse tree representations mapped to nearby locations in the representational space. The study provides concrete evidence that encoding the linearized parse trees as obtained via preorder traversal can satisfy such a requirement.

Download Full-text

Tunnel Parsing with counted repetitions

Computer Science ◽

10.7494/csci.2020.21.4.3753 ◽

2020 ◽

Vol 21 (4) ◽

Author(s):

Nikolay Handzhiyski ◽

Elena Somova

Keyword(s):

Efficient Algorithm ◽

Time Complexity ◽

Linear Time ◽

Domain Specific Languages ◽

Context Free Grammar ◽

Syntax Tree ◽

Domain Specific ◽

Concrete Syntax ◽

Parsing Algorithm ◽

Context Free

The article describes a new and efficient algorithm for parsing, called Tunnel Parsing, that parses from left to right on the basis of a context-free grammar without left recursion and rules that recognize empty words. The algorithm is applicable mostly for domain-specific languages. In the article, particular attention is paid to the parsing of grammar element repetitions. As a result of the parsing, a statically typed concrete syntax tree is built from top to bottom, that accurately reflects the grammar. The parsing is not done through a recursion, but through an iteration. The Tunnel Parsing algorithm uses the grammars directly without a prior refactoring and is with a linear time complexity for deterministic context-free grammars.

Download Full-text

A Theory of Distributed Markov Chains

Fundamenta Informaticae ◽

10.3233/fi-2020-1958 ◽

2020 ◽

Vol 175 (1-4) ◽

pp. 301-325

Author(s):

P. S. Thiagarajan ◽

Shaofa Yang

Keyword(s):

Markov Chains ◽

Linear Time ◽

New Techniques ◽

Statistical Model Checking ◽

Bounded Linear ◽

The Core ◽

Core Theory ◽

Disjoint Sets ◽

Linear Time Temporal Logic ◽

Checking Procedure

We present the theory of distributed Markov chains (DMCs). A DMC consists of a collection of communicating probabilistic agents in which the synchronizations determine the probability distribution for the next moves of the participating agents. The key feature of a DMC is that the synchronizations are deterministic, in the sense that any two simultaneously enabled synchronizations involve disjoint sets of agents. Using our theory of DMCs we show how one can analyze the behavior using the interleaved semantics of the model. A key point is, the transition system which defines the interleaved semantics is—except in degenerate cases—not a Markov chain. Hence one must develop new techniques to analyze these behaviors exhibiting both concurrency and stochasticity. After establishing the core theory we develop a statistical model checking procedure which verifies the dynamical properties of the trajectories generated by the the model. The specifications consist of Boolean combinations of component-wise bounded linear time temporal logic formulas. We also provide a probabilistic Petri net representation of DMCs and use it to derive a probabilistic event structure semantics.

Download Full-text

Founded semantics and constraint semantics of logic rules

Journal of Logic and Computation ◽

10.1093/logcom/exaa056 ◽

2020 ◽

Vol 30 (8) ◽

pp. 1609-1668 ◽

Cited By ~ 1

Author(s):

Yanhong A Liu ◽

Scott D Stoller

Keyword(s):

Computer Science ◽

Linear Time ◽

Russell’S Paradox ◽

The Core ◽

Binary Choices ◽

Russell's Paradox ◽

Straightforward Extension

Abstract Logic rules and inference are fundamental in computer science and have been studied extensively. However, prior semantics of logic languages can have subtle implications and can disagree significantly, on even very simple programs, including in attempting to solve the well-known Russell’s paradox. These semantics are often non-intuitive and hard-to-understand when unrestricted negation is used in recursion. This paper describes a simple new semantics for logic rules, founded semantics, and its straightforward extension to another simple new semantics, constraint semantics, that unify the core of different prior semantics. The new semantics support unrestricted negation, as well as unrestricted existential and universal quantifications. They are uniquely expressive and intuitive by allowing assumptions about the predicates, rules and reasoning to be specified explicitly, as simple and precise binary choices. They are completely declarative and relate cleanly to prior semantics. In addition, founded semantics can be computed in linear time in the size of the ground program.

Download Full-text

A general context-free parsing algorithm running in linear time on every LR(k) grammar without using lookahead

Theoretical Computer Science ◽

10.1016/0304-3975(91)90180-a ◽

1991 ◽

Vol 82 (1) ◽

pp. 165-176 ◽

Cited By ~ 4

Author(s):

Joop M.I.M. Leo

Keyword(s):

Linear Time ◽

General Context ◽

Parsing Algorithm ◽

Context Free

Download Full-text

Efficiently extracting full parse trees using regular expressions with capture groups

10.7287/peerj.preprints.1248v1 ◽

2015 ◽

Author(s):

Niko Schwarz ◽

Aaron Karper ◽

Oscar Nierstrasz

Keyword(s):

Streaming Data ◽

Parse Tree ◽

Finite State Automata ◽

Regular Expressions ◽

Complete Control ◽

Single Pass ◽

Finite State ◽

Parse Trees ◽

Natural Way ◽

Do So

Regular expressions with capture groups offer a concise and natural way to define parse trees over the text that they are parsing, however classical algorithms only return a single match for each capture group, not the full parse tree. We describe an algorithm based on finite-state automata that extracts full parse trees from text in Θ (n,m) time and Θ(dn + m) space (where n is the size of the text, m the size of the pattern, and d the number of groups in the pattern). It is the first to do so in a single pass with complete control over greediness. This allows the algorithm to process streaming data using all constructs familiar to users of regular expressions.

Download Full-text

How to Design a Connectionist Holistic Parser

Neural Computation ◽

10.1162/089976699300016061 ◽

1999 ◽

Vol 11 (8) ◽

pp. 1995-2016 ◽

Cited By ~ 10

Author(s):

Edward Kei Shin Ho ◽

Lai Wan Chan

Keyword(s):

Parse Tree ◽

Attractive Alternative ◽

Generalization Capability ◽

Training Set ◽

The Past ◽

Parse Trees ◽

Training Examples ◽

Design Factors ◽

Design Techniques

Connectionist holistic parsing offers a viable and attractive alternative to traditional algorithmic parsers. With exposure to a limited subset of grammatical sentences and their corresponding parse trees only, a holistic parser is capable of learning inductively the grammatical regularity underlying the training examples that affects the parsing process. In the past, various connectionist parsers have been proposed. Each approach had its own unique characteristics, and yet some techniques were shared in common. In this article, various dimensions underlying the design of a holistic parser are explored, including the methods to encode sentences and parse trees, whether a sentence and its corresponding parse tree share the same representation, the use of confluent inference, and the inclusion of phrases in the training set. Different combinations of these design factors give rise to different holistic parsers. In succeeding discussions, we scrutinize these design techniques and compare the performances of a few parsers on language parsing, including the confluent preorder parser, the backpropagation parsing network, the XERIC parser of Berg (1992), the modular connectionist parser of Sharkey and Sharkey (1992), Reilly's (1992) model, and their derivatives. Experiments are performed to evaluate their generalization capability and robustness. The results reveal a number of issues essential for building an effective holistic parser.

Download Full-text

Nonminimal Derivations in Unification-based Parsing

Computational Linguistics ◽

10.1162/089120101750300535 ◽

2001 ◽

Vol 27 (2) ◽

pp. 277-285

Author(s):

Noriko Tomuro ◽

Steven L. Lytinen

Keyword(s):

Computational Cost ◽

Precise Definition ◽

Parse Tree ◽

Feature Structures ◽

Unification Grammars ◽

Parsing Algorithm ◽

Definition Of ◽

Context Free ◽

Context Free Grammars

Shieber's abstract parsing algorithm (Shieber 1992) for unification grammars is an extension of Earley's algorithm (Earley 1970) for context-free grammars to feature structures. In this paper, we show that, under certain conditions, Shieber's algorithm produces what we call a nonminimal derivation: a parse tree which contains additional features that are not in the licensing productions. While Shieber's definition of parse tree allows for such nonminimal derivations, we claim that they should be viewed as invalid. We describe the sources of the nonminimal derivation problem, and propose a precise definition of minimal parse tree, as well as a modification to Shieber's algorithm which ensures minimality, although at some computational cost.

Download Full-text