scholarly journals Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

Author(s):  
Thao Le Minh ◽  
Nobuyuki Shimizu ◽  
Takashi Miyazaki ◽  
Koichi Shinoda

With the widespread use of intelligent systems, such as smart speakers, addressee recognition has become a concern in human-computer interaction, as more and more people expect such systems to understand complicated social scenes, including those outdoors, in cafeterias, and hospitals. Because previous studies typically focused only on pre-specified tasks with limited conversational situations such as controlling smart homes, we created a mock dataset called Addressee Recognition in Visual Scenes with Utterances (ARVSU) that contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario. We also propose a multi-modal deep-learning-based model that takes different human cues, specifically eye gazes and transcripts of an utterance corpus, into account to predict the conversational addressee from a specific speaker's view in various real-life conversational scenarios. To the best of our knowledge, we are the first to introduce an end-to-end deep learning model that combines vision and transcripts of utterance for addressee recognition. As a result, our study suggests that future addressee recognition can reach the ability to understand human intention in many social situations previously unexplored, and our modality dataset is a first step in promoting research in this field.

Sensors ◽  
2020 ◽  
Vol 20 (10) ◽  
pp. 2972
Author(s):  
Qinghua Gao ◽  
Shuo Jiang ◽  
Peter B. Shull

Hand gesture classification and finger angle estimation are both critical for intuitive human–computer interaction. However, most approaches study them in isolation. We thus propose a dual-output deep learning model to enable simultaneous hand gesture classification and finger angle estimation. Data augmentation and deep learning were used to detect spatial-temporal features via a wristband with ten modified barometric sensors. Ten subjects performed experimental testing by flexing/extending each finger at the metacarpophalangeal joint while the proposed model was used to classify each hand gesture and estimate continuous finger angles simultaneously. A data glove was worn to record ground-truth finger angles. Overall hand gesture classification accuracy was 97.5% and finger angle estimation R 2 was 0.922, both of which were significantly higher than shallow existing learning approaches used in isolation. The proposed method could be used in applications related to the human–computer interaction and in control environments with both discrete and continuous variables.


2020 ◽  
Vol 13 (4) ◽  
pp. 627-640 ◽  
Author(s):  
Avinash Chandra Pandey ◽  
Dharmveer Singh Rajpoot

Background: Sentiment analysis is a contextual mining of text which determines viewpoint of users with respect to some sentimental topics commonly present at social networking websites. Twitter is one of the social sites where people express their opinion about any topic in the form of tweets. These tweets can be examined using various sentiment classification methods to find the opinion of users. Traditional sentiment analysis methods use manually extracted features for opinion classification. The manual feature extraction process is a complicated task since it requires predefined sentiment lexicons. On the other hand, deep learning methods automatically extract relevant features from data hence; they provide better performance and richer representation competency than the traditional methods. Objective: The main aim of this paper is to enhance the sentiment classification accuracy and to reduce the computational cost. Method: To achieve the objective, a hybrid deep learning model, based on convolution neural network and bi-directional long-short term memory neural network has been introduced. Results: The proposed sentiment classification method achieves the highest accuracy for the most of the datasets. Further, from the statistical analysis efficacy of the proposed method has been validated. Conclusion: Sentiment classification accuracy can be improved by creating veracious hybrid models. Moreover, performance can also be enhanced by tuning the hyper parameters of deep leaning models.


2020 ◽  
Vol 197 ◽  
pp. 105674
Author(s):  
Dingding Yu ◽  
Kaijie Zhang ◽  
Lingyan Huang ◽  
Bonan Zhao ◽  
Xiaoshan Zhang ◽  
...  

2021 ◽  
Vol 296 ◽  
pp. 126564
Author(s):  
Md Alamgir Hossain ◽  
Ripon K. Chakrabortty ◽  
Sondoss Elsawah ◽  
Michael J. Ryan

Sign in / Sign up

Export Citation Format

Share Document