multimodal systems
Recently Published Documents


TOTAL DOCUMENTS

159
(FIVE YEARS 37)

H-INDEX

12
(FIVE YEARS 2)

2021 ◽  
Vol 3 ◽  
Author(s):  
Jingyao Wu ◽  
Ting Dang ◽  
Vidhyasaharan Sethu ◽  
Eliathamby Ambikairajah

People perceive emotions via multiple cues, predominantly speech and visual cues, and a number of emotion recognition systems utilize both audio and visual cues. Moreover, the perception of static aspects of emotion (speaker's arousal level is high/low) and the dynamic aspects of emotion (speaker is becoming more aroused) might be perceived via different expressive cues and these two aspects are integrated to provide a unified sense of emotion state. However, existing multimodal systems only focus on single aspect of emotion perception and the contributions of different modalities toward modeling static and dynamic emotion aspects are not well explored. In this paper, we investigate the relative salience of audio and video modalities to emotion state prediction and emotion change prediction using a Multimodal Markovian affect model. Experiments conducted in the RECOLA database showed that audio modality is better at modeling the emotion state of arousal and video for emotion state of valence, whereas audio shows superior advantages over video in modeling emotion changes for both arousal and valence.


2021 ◽  
Vol 28 (3) ◽  
pp. 87-95
Author(s):  
Veaceslav Perju ◽  

Target recognition is of great importance in military and civil applications – object detection, security and surveillance, access and border control, etc. In the article the general structure and main components of a target recognition system are presented. The characteristics such as availability, distinctiveness, robustness, and accessibility are described, which influence the reliability of a TRS. The graph presentations and mathematical descriptions of a unimodal and multimodal TRS are given. The mathematical models for a probability of correct target recognition in these systems are presented. To increase the reliability of TRS, a new approach was proposed – to use a set of classification algorithms in the systems. This approach permits the development of new kinds of systems - Multiple Classification Algorithms Unimodal and Multimodal Systems (MAUMS and MAMMS). The graph presentations, mathematical descriptions of the MAUMS and MAMMS are described. The evaluation of the correct target recognition was made for different systems. The conditions of systems' effectiveness were established. The modality of the algorithm's recognition probabilitymaximal value determination for an established threshold level of the system's recognition probability was proposed, which will describe the requirements for the quality and, respectively, the costs of the recognition algorithms. The proposed theory permits the system's design depending on a predetermined recognition probability.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Xiaomei Zhang ◽  
Pengming Zhang ◽  
Haomin Hu

Behavior-based continuous authentication is an increasingly popular methodology that utilizes behavior modeling and sensing for authentication and account access authorization. As an appearing behavioral biometric, user interaction patterns with mobile devices focus on verifying their identity in terms of their features or operating styles while interacting with devices. However, unimodal continuous authentication schemes, which are on the basis of a single source of interaction information, can only deal with a particular action or scenario. Hence, multimodal systems should be taken to suit for various environmental conditions especially in circumstances of attacks. In this paper, we propose a multimodal continuous authentication method both based on static interaction patterns and dynamic interaction patterns with mobile devices. Behavioral biometric features, HMHP, which is combined hand motion (HM) and hold posture (HP), are essentially established upon the touch screen and accelerator and capture the variation model of microhand motions and hold patterns generated in both dynamic and static scenes. By combining the features of HM and HP, the fusion feature HMHP achieves 97% accuracy with a 3.49% equal error rate.


2021 ◽  
Author(s):  
Hans Rutger Bosker ◽  
Marieke Hoetjes ◽  
Wim Pouw ◽  
Lieke van Maastricht

The prosody of a second language (L2) is notoriously difficult to acquire. It requires the mastery of a range of nested multimodal systems, including articulatory but also gestural signals, as hand gestures are produced in close synchrony with spoken prosody. It remains unclear how easily the articulatory and gestural systems acquire new prosodic patterns in the L2 and how the two systems interact, especially when L1 patterns interfere. This interdisciplinary pre-registered study investigates how Dutch learners of Spanish produce multimodal lexical stress in Spanish-Dutch cognates (e.g., Spanish profeSOR vs. Dutch proFESsor). Acoustic analyses assess whether gesturing helps L2 speakers to place stress on the correct syllable; and whether gesturing boosts the acoustic correlates of stress through biomechanic coupling. Moreover, motion-tracking and time-series analyses test whether gesture-prosody synchrony is enhanced for stress-matching vs. stress-mismatching cognate pairs, perhaps revealing that gestural timing is biased in the L1 (or L2) direction (e.g., Spanish profeSOR with the gesture biased towards Dutch stressed syllable -fes). Thus, we will uncover how speakers deal with manual, articulatory, and cognitive constraints that need to be brought in harmony for efficient speech production, bearing implications for theories on gesture-speech interaction and multimodal L2 acquisition.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1336
Author(s):  
Jorge Iranzo Bartolome ◽  
Gilsang Cho ◽  
Jun-Dong Cho

For years the HCI community’s research has been focused on the hearing and sight senses. However, in recent times, there has been an increased interest in using other types of senses, such as smell or touch. Moreover, this has been accompanied with growing research related to sensory substitution techniques and multi-sensory systems. Similarly, contemporary art has also been influenced by this trend and the number of artists interested in creating novel multi-sensory works of art has increased substantially. As a result, the opportunities for visually impaired people to experience artworks in different ways are also expanding. In spite of all this, the research focusing on multimodal systems for experiencing visual arts is not large and user tests comparing different modalities and senses, particularly in the field of art, are insufficient. This paper attempts to design a multi-sensory mapping to convey color to visually impaired people employing musical sounds and temperature cues. Through user tests and surveys with a total of 18 participants, we show that this multi-sensory system is properly designed to allow the user to distinguish and experience a total of 24 colors. The tests consist of several semantic correlational adjective-based surveys for comparing the different modalities to find out the best way to express colors through musical sounds and temperature cues based on previously well-established sound-color and temperature-color coding algorithms. In addition, the resulting final algorithm is also tested with 12 more users.


2021 ◽  
Vol 9 (60) ◽  

Kinetic typography, which is used firstly in film generics by metaphors created, has become widespread in social, economic and cultural fields such as internet, advertisements, TV with increasing of the need of communication under technological advance and the influence of globalization. The most important factors in the widespread use of this communication tool are that the message that is intended to be conveyed can be presented interactively and the design elements can be manipulated directly in order to create a certain mood in the audience. In kinetic typography, the elements of sound, motion and time present the abstract concepts as concrete and each element is evaluated as a mode (method, tool). Using these elements, the sign system is revealed through the visualization of the language, and all our experiences of the world are represented by creating a visual, spatial and aural environment in kinetic typography. Therefore, it is stated that kinetic typography is not just composed of letter forms, it is also a means of multimodal semiotic expression integrated with abstract concepts such as sound, motion, and color. Since multimodal systems have two or more semiotic tools to fulfil communication, the process of making meaning needs to be analyzed from different angles. The concept of social semiotic, which deals with how people communicate in society, this study has been tried to be carried out by adapting it to the field of visual communication. Expressing how designers portray the physical movement unique to the world in the digital environment is one of the main points of this research. Therefore, by analyzing the applications created with kinetic typography, why and how each design formation was created was expressed by examined in terms of the methods used. The aim of this study is to demonstrate the potential to create meaning with instance examinations in terms of the transitivity system which is a part of social semiotics in kinetic typography which has a multimodal (methodical) structure. Keywords: Typography, kinetic typography, social semiotics, transitivity system


Author(s):  
Alena Velichko ◽  
Alexey Karpov

In recent years the interest in automatic depression detection has grown within medical and scientific-technical communities. Depression is one of the most widespread mental illnesses that affects human life. In this review we present and analyze the latest researches devoted to depression detection. Basic notions related to the definition of depression were specified, the review includes both unimodal and multimodal corpora containing records of informants diagnosed with depression and control groups of non-depressed people. Theoretical and practical researches which present automated systems for depression detection were reviewed. The last ones include unimodal as well as multimodal systems. A part of reviewed systems addresses the challenge of regressive classification predicting the degree of depression severity (non-depressed, mild, moderate and severe), and another part solves a problem of binary classification predicting the presence of depression (if a person is depressed or not). An original classification of methods for computing of informative features for three communicative modalities (audio, video, text information) is presented. New methods for depression detection in every modality and all modalities in total are defined. The most popular methods for depression detection in reviewed studies are neural networks. The survey has shown that the main features of depression are psychomotor retardation that affects all communicative modalities and strong correlation with affective values of valency, activation and domination, also there has been observed an inverse correlation between depression and aggression. Discovered correlations confirm interrelation of affective disorders and human emotional states. The trend observed in many reviewed papers is that combining modalities improves the results of depression detection systems.


Author(s):  
Leidy Esperanza Pamplona-Beron ◽  
Carlos Alberto Henao Baena ◽  
Andrés Felipe Calvo-Salcedo

Human activity detection has evolved due to the advances and developments of machine learning techniques, which have enabled solutions to new challenges without ignoring prevalent difficulties that need to be addressed. One of the challenges is the learning model’s sensitivity regarding the unbalanced, atypical, and overlapping information that directly affects the performance of the model. This article evaluates a methodology for the classification of human activities that penalizes defective information. The methodology is carried out through two redundant classifiers, a penalized support vector machine that detects the sub-movements (micro-movements) and the Marvok Hidden Model that predicts the activity given the micro- movements sequence. The performance of the method was compared with state-of-the-art techniques, and the findings suggested significative advance in the detection of micro-movements compared to the data obtained with non-penalized paradigms. In this research, an adequate performance is found in the classification of primitive movements, with hit rates of 95.15% for the Kinect One®, 96.86% for the IMU sensor network, and 67.51% for the EMG sensor network.


Sign in / Sign up

Export Citation Format

Share Document