probabilistic grammar
Recently Published Documents


TOTAL DOCUMENTS

40
(FIVE YEARS 8)

H-INDEX

8
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Apurva Apurva ◽  
Samar Husain

The surprisal metric (Hale, 2001; Levy, 2008) successfully predicts syntactic complexity in a large number of online studies (e.g., Demberg and Keller, 2009; Levy and Keller, 2013). Surprisal assumes a probabilistic grammar that drives the expectation of upcoming linguistic material. Consequently, wrong predictions lead to a processing cost, presumably due to reranking related computations (Levy, 2013). Critically, surprisal assumes that the predicted parses generated by the probabilistic grammar are grammatical. However, it has been found that syntactic predictions can be ungrammatical (e.g., Apurva & Husain, 2018). Consequently, similar to reranking costs incurred due to incorrect (grammatical) predictions, a cost should also appear for ungrammatical predictions. Evidence for such a cost during comprehension will not be explained by the surprisal metric. To test the ecological validity of the surprisal metric, it becomes critical to investigate if ungrammatical predictions incur a cost. In this study, we investigate this issue in Hindi (a verb-final language) using a cloze task followed by a self-paced reading (SPR) study. All analyses were carried out in R using linear mixed models. Log RTs (reading time) were used for the RT analyses. In the cloze study (N=30), participants were asked to complete the sentences (such as 1a, 1b) meaningfully using the SPR paradigm. The two conditions differed in the case markers on the three nouns. 12 sets of experimental items along with 64 fillers were used. Participants’ responses were coded for the predicted verb class and the overall grammaticality of the completion (grammatical prediction vs ungrammatical prediction). 1a. hari-ne geeta-se umesh-ko…. Hari-ERG Geeta=ABL Umesh=ACC. 1b. hari-ko geeta-ne umesh-ko …. Hari-ACC Geeta-ERG Umesh-ACC. Grammaticality analysis of the completion data showed that participants make more ungrammatical completions in conditions (b) compared to (a) (z=5.25). The overall grammatical completions in condition (a) was 96% while in (b) it was 60%. In addition, the verb class analysis showed that in both conditions participants completed the sentences with a transitive non-finite verb followed by a ditransitive matrix verb (hereafter T.NF-DT.M) most frequently. T.NF-DT.M were predicted in 33% instance in condition (a) and 34% in condition (b) (z=0.18). Given the similar cloze probabilities, the surprisal metric will predict no difference in RT at T.NF-DT.M in the two conditions during online processing (cloze probabilities can be used to compute surprisal, see Levy and Keller, 2013). If the RTs at T.NF-DT.M in condition (a) is less than (b) that would be better explained by the higher cost due to the ungrammatical prediction. To ascertain this, we conducted an SPR study (n=50) using items similar to the ones used in the previous experiment (see, 2a and 2b). The critical region was T.NF-DT.M. 24 set of items along with 72 fillers were constructed. 2a hari-ne geeta-se umesh-ko milne ko kaha, Hari-ERG Geeta=ABL Umesh=ACC meet-inf(T.NF) told(DT.M) 2b hari-ko geeta-ne umesh-ko milne ko kaha , ... Hari-ACC Geeta=ERG Umesh=ACC meet-inf(T.NF) told(DT.M) While the prediction of T.NF-DT.M is the same in the two conditions, % ungrammatical predictions are more in (b) vs (a). Results show that the RT in (a) < (b) at the critical region (t=2.32). This goes against the surprisal metric and shows the cost incurred due to ungrammatical predictions. Our work establishes that the cost of ungrammatical predictions indeed appears during online processing. This processing cost is not predicted by a metric like surprisal and highlights its limitations. This study also provides evidence against the robust predictions in head-final languages. It suggests that the prediction mechanism in such languages is more nuanced and points to the need to study the nature of ungrammatical predictions during processing.


Corpora ◽  
2020 ◽  
Vol 15 (1) ◽  
pp. 77-106 ◽  
Author(s):  
Marianne Hundt ◽  
Paula Rautionaho ◽  
Carolin Strobl

Previous corpus-based research on the progressive (be +v- ing) investigated it from a diachronic point of view or from the angle of World Englishes (WEs). However, factors such as its propensity to occur with animate subjects or its preference for dynamic verbs have not been studied in relation to the choice between progressive and simple aspect. As the progressive has been extended to stative verbs, we argue that a variationist study of the construction in WEs needs to take simple vps into account systematically, too, and investigate whether there is interaction between predictor variables underlying the progressive:simple choice. We use a probabilistic grammar approach to study progressives in newspaper writing across a broad range of WEs. We apply a tree and forest analysis to gauge the relative strength of the predictor variables variety, animacy, tense/modality, verb type and voice. Our results show that the core grammar for the progressive:simple choice is shared across all Englishes. The extension of progressives to stative verbs, in particular, does not result in statistically detectable effects. We argue that they nevertheless serve to give a very ‘local’ flavour to contact varieties as they are salient against the backdrop of the core grammar.


2020 ◽  
Author(s):  
Xiaoying Pu ◽  
Matthew Kay

Visualizations depicting probabilities and uncertainty are used everywhere from medical risk communication to machine learning, yet these probabilistic visualizations are difficult to specify, prone to error, and their designs are cumbersome to explore. We propose a Probabilistic Grammar of Graphics (PGoG), an extension to Wilkinson’s original framework. Inspired by the success of probabilistic programming languages, PGoG makes probability expressions, such as P(A|B), a first-class citizen in the language. PGoG abstractions also reflect the distinction between probability and frequency framing, a concept from the uncertainty communication literature. It is expressive, encompassing product plots, density plots, icon arrays, and dotplots, among other visualizations. Its coherent syntax ensures correctness (that the proportions of visual elements and their spatial placement reflect the underlying probability distribution) and reduces edit distance between probabilistic visualization specifications, potentially support- ing more design exploration. We provide a proof-of-concept implementation of PGoG in R.


2018 ◽  
Vol 77 (21) ◽  
pp. 28417-28440 ◽  
Author(s):  
Dan Li ◽  
Disheng Hu ◽  
Yuke Sun ◽  
Yingsong Hu

Author(s):  
Mark E. Whiting ◽  
Jonathan Cagan ◽  
Philip LeDuc

AbstractThe use of grammars in design and analysis has been set back by the lack of automated ways to induce them from arbitrarily structured datasets. Machine translation methods provide a construct for inducing grammars from coded data which have been extended to be used for design through pre-coded design data. This work introduces a four-step process for inducing grammars from un-coded structured datasets which can constitute a wide variety of data types, including many used in the design. The method includes: (1) extracting objects from the data, (2) forming structures from objects, (3) expanding structures into rules based on frequency, and (4) finding rule similarities that lead to consolidation or abstraction. To evaluate this method, grammars are induced from generated data, architectural layouts and three-dimensional design models to demonstrate that this method offers usable grammars automatically which are functionally similar to grammars produced by hand.


Sign in / Sign up

Export Citation Format

Share Document