Hierarchically Classified Probabilistic Grammar Parsing

2011 ◽  
Vol 22 (2) ◽  
pp. 245-257 ◽  
Author(s):  
Yin-Tang DAI ◽  
Cheng-Rong WU ◽  
Sheng-Xiang MA ◽  
Yi-Ping ZHONG
2014 ◽  
Vol 33 (6) ◽  
pp. 1-12 ◽  
Author(s):  
Tianqiang Liu ◽  
Siddhartha Chaudhuri ◽  
Vladimir G. Kim ◽  
Qixing Huang ◽  
Niloy J. Mitra ◽  
...  

2020 ◽  
Author(s):  
Xiaoying Pu ◽  
Matthew Kay

Visualizations depicting probabilities and uncertainty are used everywhere from medical risk communication to machine learning, yet these probabilistic visualizations are difficult to specify, prone to error, and their designs are cumbersome to explore. We propose a Probabilistic Grammar of Graphics (PGoG), an extension to Wilkinson’s original framework. Inspired by the success of probabilistic programming languages, PGoG makes probability expressions, such as P(A|B), a first-class citizen in the language. PGoG abstractions also reflect the distinction between probability and frequency framing, a concept from the uncertainty communication literature. It is expressive, encompassing product plots, density plots, icon arrays, and dotplots, among other visualizations. Its coherent syntax ensures correctness (that the proportions of visual elements and their spatial placement reflect the underlying probability distribution) and reduces edit distance between probabilistic visualization specifications, potentially support- ing more design exploration. We provide a proof-of-concept implementation of PGoG in R.


2018 ◽  
Vol 77 (21) ◽  
pp. 28417-28440 ◽  
Author(s):  
Dan Li ◽  
Disheng Hu ◽  
Yuke Sun ◽  
Yingsong Hu

2021 ◽  
Author(s):  
Apurva Apurva ◽  
Samar Husain

The surprisal metric (Hale, 2001; Levy, 2008) successfully predicts syntactic complexity in a large number of online studies (e.g., Demberg and Keller, 2009; Levy and Keller, 2013). Surprisal assumes a probabilistic grammar that drives the expectation of upcoming linguistic material. Consequently, wrong predictions lead to a processing cost, presumably due to reranking related computations (Levy, 2013). Critically, surprisal assumes that the predicted parses generated by the probabilistic grammar are grammatical. However, it has been found that syntactic predictions can be ungrammatical (e.g., Apurva & Husain, 2018). Consequently, similar to reranking costs incurred due to incorrect (grammatical) predictions, a cost should also appear for ungrammatical predictions. Evidence for such a cost during comprehension will not be explained by the surprisal metric. To test the ecological validity of the surprisal metric, it becomes critical to investigate if ungrammatical predictions incur a cost. In this study, we investigate this issue in Hindi (a verb-final language) using a cloze task followed by a self-paced reading (SPR) study. All analyses were carried out in R using linear mixed models. Log RTs (reading time) were used for the RT analyses. In the cloze study (N=30), participants were asked to complete the sentences (such as 1a, 1b) meaningfully using the SPR paradigm. The two conditions differed in the case markers on the three nouns. 12 sets of experimental items along with 64 fillers were used. Participants’ responses were coded for the predicted verb class and the overall grammaticality of the completion (grammatical prediction vs ungrammatical prediction). 1a. hari-ne geeta-se umesh-ko…. Hari-ERG Geeta=ABL Umesh=ACC. 1b. hari-ko geeta-ne umesh-ko …. Hari-ACC Geeta-ERG Umesh-ACC. Grammaticality analysis of the completion data showed that participants make more ungrammatical completions in conditions (b) compared to (a) (z=5.25). The overall grammatical completions in condition (a) was 96% while in (b) it was 60%. In addition, the verb class analysis showed that in both conditions participants completed the sentences with a transitive non-finite verb followed by a ditransitive matrix verb (hereafter T.NF-DT.M) most frequently. T.NF-DT.M were predicted in 33% instance in condition (a) and 34% in condition (b) (z=0.18). Given the similar cloze probabilities, the surprisal metric will predict no difference in RT at T.NF-DT.M in the two conditions during online processing (cloze probabilities can be used to compute surprisal, see Levy and Keller, 2013). If the RTs at T.NF-DT.M in condition (a) is less than (b) that would be better explained by the higher cost due to the ungrammatical prediction. To ascertain this, we conducted an SPR study (n=50) using items similar to the ones used in the previous experiment (see, 2a and 2b). The critical region was T.NF-DT.M. 24 set of items along with 72 fillers were constructed. 2a hari-ne geeta-se umesh-ko milne ko kaha, Hari-ERG Geeta=ABL Umesh=ACC meet-inf(T.NF) told(DT.M) 2b hari-ko geeta-ne umesh-ko milne ko kaha , ... Hari-ACC Geeta=ERG Umesh=ACC meet-inf(T.NF) told(DT.M) While the prediction of T.NF-DT.M is the same in the two conditions, % ungrammatical predictions are more in (b) vs (a). Results show that the RT in (a) < (b) at the critical region (t=2.32). This goes against the surprisal metric and shows the cost incurred due to ungrammatical predictions. Our work establishes that the cost of ungrammatical predictions indeed appears during online processing. This processing cost is not predicted by a metric like surprisal and highlights its limitations. This study also provides evidence against the robust predictions in head-final languages. It suggests that the prediction mechanism in such languages is more nuanced and points to the need to study the nature of ungrammatical predictions during processing.


Sign in / Sign up

Export Citation Format

Share Document