multimodal representations
Recently Published Documents


TOTAL DOCUMENTS

85
(FIVE YEARS 33)

H-INDEX

14
(FIVE YEARS 2)

2021 ◽  
Vol 11 (9) ◽  
pp. 514
Author(s):  
Josephine Convertini ◽  
Francesco Arcidiacono

In kindergarten, children are usually engaged with both verbal activities and non-verbal activities, often requiring the manipulation of physical objects. During technical tasks (e.g., problem solving), children can use argumentation as one of the languages of science that mediates how they interact with the surrounding world. In this paper, we focused on technical tasks in kindergarten in order to understand to what extent activities requiring the manipulation of physical objects also leave space for argumentation. The study involved 25 children engaged in three problem-solving activities requiring the manipulation of Lego® and some recycled materials. To analyze the non-verbal (embodied) side of the argumentative activities, we firstly identified the argumentative structure of each exchange involving the participants. Then, we focused on segments of “incomplete” argumentative dialogues (i.e., presenting only some elements typical of children’s argumentation) by appealing to multimodal representations (speech, gestures, and physical objects). The findings of the study showed that even apparently incomplete exchanges can have an argumentative function generated by non-verbal elements of the interactions. Investigating the role of embodied argumentation during technical tasks in kindergarten can allow teachers to recognize and further develop children’s argumentative resources.


Author(s):  
Tengfei Lyu ◽  
Jianliang Gao ◽  
Ling Tian ◽  
Zhao Li ◽  
Peng Zhang ◽  
...  

The interaction of multiple drugs could lead to serious events, which causes injuries and huge medical costs. Accurate prediction of drug-drug interaction (DDI) events can help clinicians make effective decisions and establish appropriate therapy programs. Recently, many AI-based techniques have been proposed for predicting DDI associated events. However, most existing methods pay less attention to the potential correlations between DDI events and other multimodal data such as targets and enzymes. To address this problem, we propose a Multimodal Deep Neural Network (MDNN) for DDI events prediction. In MDNN, we design a two-pathway framework including drug knowledge graph (DKG) based pathway and heterogeneous feature (HF) based pathway to obtain drug multimodal representations. Finally, a multimodal fusion neural layer is designed to explore the complementary among the drug multimodal representations. We conduct extensive experiments on real-world dataset. The results show that MDNN can accurately predict DDI events and outperform the state-of-the-art models.


Author(s):  
Chongyang Bai ◽  
Xiaoxue Zang ◽  
Ying Xu ◽  
Srinivas Sunkara ◽  
Abhinav Rastogi ◽  
...  

To improve the accessibility of smart devices and to simplify their usage, building models which understand user interfaces (UIs) and assist users to complete their tasks is critical. However, unique challenges are proposed by UI-specific characteristics, such as how to effectively leverage multimodal UI features that involve image, text, and structural metadata and how to achieve good performance when high-quality labeled data is unavailable. To address such challenges we introduce UIBert, a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data to learn generic feature representations for a UI and its components. Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other. We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI. We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.


Author(s):  
Kok-Sing Tang ◽  
Joonhyeong Park ◽  
Jina Chang

AbstractThis paper argues that meaning-making with multimodal representations in science learning is always contextualized within a genre and, conversely, what constitutes an ongoing genre also depends on a multimodal coordination of speech, gesture, diagrams, symbols, and material objects. In social semiotics, a genre is a culturally evolved way of doing things with language (including non-verbal representations). Genre provides a useful lens to understand how a community’s cultural norms and practices shape the use of language in various human activities. Despite this understanding, researchers have seldom considered the role of scientific genres (e.g., experimental account, information report, explanation) to understand how students in science classrooms make meanings as they use and construct multimodal representations. This study is based on an enactment of a drawing-to-learn approach in a primary school classroom in Australia, with data generated from classroom videos and students’ artifacts. Using multimodal discourse analysis informed by social semiotics, we analyze how the semantic variations in students’ representations correspond to the recurring genres they were enacting. We found a general pattern in the use and creation of representations across different scientific genres that support the theory of a mutual contextualization between genre and representation construction.


2021 ◽  
Vol 11 (7) ◽  
pp. 3009
Author(s):  
Sungjin Park ◽  
Taesun Whang ◽  
Yeochan Yoon ◽  
Heuiseok Lim

Visual dialog is a challenging vision-language task in which a series of questions visually grounded by a given image are answered. To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e.g., question, dialog history, and image) is required. Specifically, it is necessary for an agent to (1) determine the semantic intent of question and (2) align question-relevant textual and visual contents among heterogeneous modality inputs. In this paper, we propose Multi-View Attention Network (MVAN), which leverages multiple views about heterogeneous inputs based on attention mechanisms. MVAN effectively captures the question-relevant information from the dialog history with two complementary modules (i.e., Topic Aggregation and Context Matching), and builds multimodal representations through sequential alignment processes (i.e., Modality Alignment). Experimental results on VisDial v1.0 dataset show the effectiveness of our proposed model, which outperforms previous state-of-the-art methods under both single model and ensemble settings.


2021 ◽  
Vol 11 (1) ◽  
pp. 9
Author(s):  
Nahla Nadeem

The present study examines the rhetorical devices used by Brené Brown in a 99U conference Talk (2013) in order to engage and persuade the audience that vulnerability is the seed of creativity and therefore, should be embraced as a stepping-stone to success. The study mainly explores the role conceptual blending theory plays in the exploitation of multimodal rhetorical devices, which include an inspirational quote, analogies and metaphors (both verbal and visual) and how they form a ‘mega-blend’ and a complex network of conceptual integration. The study also applies the conceptual blending model and the discursive process of framing in the analysis as crucial for the meaning construal of these multimodal rhetorical blends. The blending-framing analysis showed that these diverse rhetorical devices often require a complex multi-frame analysis and a larger mental space network of mappings to derive the intended message and achieve the intended rhetorical effect on the audience. The analysis also showed that the blending-framing model provided a unified theoretical framework that could examine the discursive function and multimodal representations of diverse rhetorical devices in edutainment events.


Sign in / Sign up

Export Citation Format

Share Document