scholarly journals Joint Incremental Disfluency Detection and Dependency Parsing

Author(s):  
Matthew Honnibal ◽  
Mark Johnson

We present an incremental dependency parsing model that jointly performs disfluency detection. The model handles speech repairs using a novel non-monotonic transition system, and includes several novel classes of features. For comparison, we evaluated two pipeline systems, using state-of-the-art disfluency detectors. The joint model performed better on both tasks, with a parse accuracy of 90.5% and 84.0% accuracy at disfluency detection. The model runs in expected linear time, and processes over 550 tokens a second.

2014 ◽  
Vol 40 (2) ◽  
pp. 249-527 ◽  
Author(s):  
Joakim Nivre ◽  
Yoav Goldberg ◽  
Ryan McDonald

Arc-eager dependency parsers process sentences in a single left-to-right pass over the input and have linear time complexity with greedy decoding or beam search. We show how such parsers can be constrained to respect two different types of conditions on the output dependency graph: span constraints, which require certain spans to correspond to subtrees of the graph, and arc constraints, which require certain arcs to be present in the graph. The constraints are incorporated into the arc-eager transition system as a set of preconditions for each transition and preserve the linear time complexity of the parser.


2021 ◽  
Vol 12 (5) ◽  
pp. 1-21
Author(s):  
Changsen Yuan ◽  
Heyan Huang ◽  
Chong Feng

The Graph Convolutional Network (GCN) is a universal relation extraction method that can predict relations of entity pairs by capturing sentences’ syntactic features. However, existing GCN methods often use dependency parsing to generate graph matrices and learn syntactic features. The quality of the dependency parsing will directly affect the accuracy of the graph matrix and change the whole GCN’s performance. Because of the influence of noisy words and sentence length in the distant supervised dataset, using dependency parsing on sentences causes errors and leads to unreliable information. Therefore, it is difficult to obtain credible graph matrices and relational features for some special sentences. In this article, we present a Multi-Graph Cooperative Learning model (MGCL), which focuses on extracting the reliable syntactic features of relations by different graphs and harnessing them to improve the representations of sentences. We conduct experiments on a widely used real-world dataset, and the experimental results show that our model achieves the state-of-the-art performance of relation extraction.


2021 ◽  
Author(s):  
Hadi Qovaizi

Modern state-of-the-art planners operate by generating a grounded transition system prior to performing search for a solution to a given planning task. Some tasks involve a significant number of objects or entail managing predicates and action schemas with a significant number of arguments. Hence, this instantiation procedure can exhaust all available memory and therefore prevent a planner from performing search to find a solution. This thesis explores this limitation by presenting a benchmark set of problems based on Organic Chemistry Synthesis that was submitted to the latest International Planning Competition (IPC-2018). This benchmark was constructed to gauge the performance of the competing planners given that instantiation is an issue. Furthermore, a novel algorithm, the Regression-Based Heuristic Planner (RBHP), is developed with the aim of averting this issue. RBHP was inspired by the retro-synthetic approach commonly used to solve organic synthesis problems efficiently. RBHP solves planning tasks by applying domain independent heuristics, computed by regression, and performing best-first search. In contrast to most modern planners, RBHP computes heuristics backwards by applying the goal-directed regression operator. However, the best-first search proceeds forward similar to other planners. The proposed planner is evaluated on a set of planning tasks included in previous International Planning Competitions (IPC) against a subset of the top scoring state-of-the-art planners submitted to the IPC-2018.


2020 ◽  
Vol 34 (05) ◽  
pp. 8319-8326
Author(s):  
Zuchao Li ◽  
Hai Zhao ◽  
Kevin Parnow

Most syntactic dependency parsing models may fall into one of two categories: transition- and graph-based models. The former models enjoy high inference efficiency with linear time complexity, but they rely on the stacking or re-ranking of partially-built parse trees to build a complete parse tree and are stuck with slower training for the necessity of dynamic oracle training. The latter, graph-based models, may boast better performance but are unfortunately marred by polynomial time inference. In this paper, we propose a novel parsing order objective, resulting in a novel dependency parsing model capable of both global (in sentence scope) feature extraction as in graph models and linear time inference as in transitional models. The proposed global greedy parser only uses two arc-building actions, left and right arcs, for projective parsing. When equipped with two extra non-projective arc-building actions, the proposed parser may also smoothly support non-projective parsing. Using multiple benchmark treebanks, including the Penn Treebank (PTB), the CoNLL-X treebanks, and the Universal Dependency Treebanks, we evaluate our parser and demonstrate that the proposed novel parser achieves good performance with faster training and decoding.


2013 ◽  
Vol 1 ◽  
pp. 301-314 ◽  
Author(s):  
Weiwei Sun ◽  
Xiaojun Wan

We present a comparative study of transition-, graph- and PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations. Compared to linguistic grammars learned from rich phrase-structure treebanks, well designed pseudo grammars achieve similar parsing accuracy and have equivalent contributions to parser ensemble. Moreover, pseudo grammars increase the diversity of base models; therefore, together with all other models, further improve system combination. Based on automatic POS tagging, our final model achieves a UAS of 87.23%, resulting in a significant improvement of the state of the art.


2020 ◽  
Vol 34 (04) ◽  
pp. 4412-4419 ◽  
Author(s):  
Zhao Kang ◽  
Wangtao Zhou ◽  
Zhitong Zhao ◽  
Junming Shao ◽  
Meng Han ◽  
...  

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.


Author(s):  
MOHAMMAD IZADI ◽  
ALI MOVAGHAR

A component-based computing system consists of two main parts: a set of components and a coordination subsystem. Reo is an exogenous coordination language for compositional construction of the coordination subsystem. Constraint automaton has been defined as the operational semantics of Reo. The main goal of this paper is to prepare a model checking method for verifying linear time temporal properties of component-based systems whose coordinating subsystems are modeled by Reo and components are modeled by labeled transition systems. For this purpose, we introduce modified definitions of constraint automata and their composition operators by which, every constraint automaton can be considered as a labeled transition system and each labeled transition system can be translated into a constraint automaton. We show that failure-based equivalences CFFD and NDFD are congruences with respect to the composition operators of constraint automata. Also we present a method for compositional model checking of component-based systems using these equivalences for reducing the sizes of constraint automata models.


2008 ◽  
Vol 34 (2) ◽  
pp. 161-191 ◽  
Author(s):  
Kristina Toutanova ◽  
Aria Haghighi ◽  
Christopher D. Manning

We present a model for semantic role labeling that effectively captures the linguistic intuition that a semantic argument frame is a joint structure, with strong dependencies among the arguments. We show how to incorporate these strong dependencies in a statistical joint model with a rich set of features over multiple argument phrases. The proposed model substantially outperforms a similar state-of-the-art local model that does not include dependencies among different arguments. We evaluate the gains from incorporating this joint information on the Propbank corpus, when using correct syntactic parse trees as input, and when using automatically derived parse trees. The gains amount to 24.1% error reduction on all arguments and 36.8% on core arguments for gold-standard parse trees on Propbank. For automatic parse trees, the error reductions are 8.3% and 10.3% on all and core arguments, respectively. We also present results on the CoNLL 2005 shared task data set. Additionally, we explore considering multiple syntactic analyses to cope with parser noise and uncertainty.


Author(s):  
Yan Zhou ◽  
Longtao Huang ◽  
Tao Guo ◽  
Jizhong Han ◽  
Songlin Hu

Target-Based Sentiment Analysis aims at extracting opinion targets and classifying the sentiment polarities expressed on each target. Recently, token based sequence tagging methods have been successfully applied to jointly solve the two tasks, which aims to predict a tag for each token. Since they do not treat a target containing several words as a whole, it might be difficult to make use of the global information to identify that opinion target, leading to incorrect extraction. Independently predicting the sentiment for each token may also lead to sentiment inconsistency for different words in an opinion target. In this paper, inspired by span-based methods in NLP, we propose a simple and effective joint model to conduct extraction and classification at span level rather than token level. Our model first emulates spans with one or more tokens and learns their representation based on the tokens inside. And then, a span-aware attention mechanism is designed to compute the sentiment information towards each span. Extensive experiments on three benchmark datasets show that our model consistently outperforms the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document