scholarly journals Multimodal language processing: How preceding discourse constrains gesture interpretation and affects gesture integration when gestures do not synchronise with semantic affiliates

2021 ◽  
Vol 117 ◽  
pp. 104191
Author(s):  
Isabella Fritz ◽  
Sotaro Kita ◽  
Jeannette Littlemore ◽  
Andrea Krott
2021 ◽  
Author(s):  
Wim Pouw ◽  
Jan de Wit ◽  
Sara Bögels ◽  
Marlou Rasenberg ◽  
Branka Milivojevic ◽  
...  

Most manual communicative gestures that humans produce cannot be looked up in a dictionary, as these manual gestures inherit their meaning in large part from the communicative context and are not conventionalized. However, it is understudied to what extent the communicative signal as such — bodily postures in movement, or kinematics — can inform about gesture semantics. Can we construct, in principle, a distribution-based semantics of gesture kinematics, similar to how word vectorization methods in NLP (Natural language Processing) are now widely used to study semantic properties in text and speech? For such a project to get off the ground, we need to know the extent to which semantically similar gestures are more likely to be kinematically similar. In study 1 we assess whether semantic word2vec distances between the conveyed concepts participants were explicitly instructed to convey in silent gestures, relate to the kinematic distances of these gestures as obtained from Dynamic Time Warping (DTW). In a second director-matcher dyadic study we assess kinematic similarity between spontaneous co-speech gestures produced between interacting participants. Participants were asked before and after they interacted how they would name the objects. The semantic distances between the resulting names were related to the gesture kinematic distances of gestures that were made in the context of conveying those objects in the interaction. We find that the gestures’ semantic relatedness is reliably predictive of kinematic relatedness across these highly divergent studies, which suggests that the development of an NLP method of deriving semantic relatedness from kinematics is a promising avenue for future developments in automated multimodal recognition. Deeper implications for statistical learning processes in multimodal language are discussed.


2019 ◽  
Vol 23 (8) ◽  
pp. 639-652 ◽  
Author(s):  
Judith Holler ◽  
Stephen C. Levinson

2004 ◽  
Vol 16 (5) ◽  
pp. 715-726 ◽  
Author(s):  
Tamara Y. Swaab ◽  
C. Christine Camblin ◽  
Peter C. Gordon

Effects of word repetition are extremely robust, but can these effects be modulated by discourse context? We examined this in an ERP experiment that tested coreferential processing (when two expressions refer to the same person) with repeated names. ERPs were measured to repeated names and pronoun controls in two conditions: (1) In the prominent condition the repeated name or pronoun coreferred with the subject of the preceding sentence and was therefore prominent in the preceding discourse (e.g., “John went to the store after John/he …”); (2) in the nonprominent condition the repeated name or pronoun coreferred with a name that was embedded in a conjoined noun phrase, and was therefore nonprominent (e.g., “John and Mary went to the store after John/he …”). Relative to the prominent condition, the nonprominent condition always contained two extra words (e.g., “and Mary”), and the repetition lag was therefore smaller in the prominent condition. Typically, effects of repetition are larger with smaller lags. Nevertheless, the amplitude of the N400 was reduced to a coreferentially repeated name when the antecedent was nonprominent as compared to when it was prominent. No such difference was observed for the pronoun controls. Because the N400 effect reflects difficulties in lexical integration, this shows that the difficulty of achieving coreference with a name increased with the prominence of the referent. This finding is the reverse of repetition lag effects on N400 previously found with word lists, and shows that language context can override general memory mechanisms.


2009 ◽  
Vol 35 (3) ◽  
pp. 345-397 ◽  
Author(s):  
Srinivas Bangalore ◽  
Michael Johnston

Multimodal grammars provide an effective mechanism for quickly creating integration and understanding capabilities for interactive systems supporting simultaneous use of multiple input modalities. However, like other approaches based on hand-crafted grammars, multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent input. In this article, we show how the finite-state approach to multimodal language processing can be extended to support multimodal applications combining speech with complex freehand pen input, and evaluate the approach in the context of a multimodal conversational system (MATCH). We explore a range of different techniques for improving the robustness of multimodal integration and understanding. These include techniques for building effective language models for speech recognition when little or no multimodal training data is available, and techniques for robust multimodal understanding that draw on classification, machine translation, and sequence edit methods. We also explore the use of edit-based methods to overcome mismatches between the gesture stream and the speech stream.


2016 ◽  
Vol 39 ◽  
Author(s):  
Giosuè Baggio ◽  
Carmelo M. Vicario

AbstractWe agree with Christiansen & Chater (C&C) that language processing and acquisition are tightly constrained by the limits of sensory and memory systems. However, the human brain supports a range of cognitive functions that mitigate the effects of information processing bottlenecks. The language system is partly organised around these moderating factors, not just around restrictions on storage and computation.


Author(s):  
Jennifer M. Roche ◽  
Arkady Zgonnikov ◽  
Laura M. Morett

Purpose The purpose of the current study was to evaluate the social and cognitive underpinnings of miscommunication during an interactive listening task. Method An eye and computer mouse–tracking visual-world paradigm was used to investigate how a listener's cognitive effort (local and global) and decision-making processes were affected by a speaker's use of ambiguity that led to a miscommunication. Results Experiments 1 and 2 found that an environmental cue that made a miscommunication more or less salient impacted listener language processing effort (eye-tracking). Experiment 2 also indicated that listeners may develop different processing heuristics dependent upon the speaker's use of ambiguity that led to a miscommunication, exerting a significant impact on cognition and decision making. We also found that perspective-taking effort and decision-making complexity metrics (computer mouse tracking) predict language processing effort, indicating that instances of miscommunication produced cognitive consequences of indecision, thinking, and cognitive pull. Conclusion Together, these results indicate that listeners behave both reciprocally and adaptively when miscommunications occur, but the way they respond is largely dependent upon the type of ambiguity and how often it is produced by the speaker.


Sign in / Sign up

Export Citation Format

Share Document