scholarly journals Depression Detection on Reddit With an Emotion-Based Attention Network: Algorithm Development and Validation (Preprint)

2021 ◽  
Author(s):  
Lu Ren ◽  
Hongfei Lin ◽  
Bo Xu ◽  
Shaowu Zhang ◽  
Liang Yang ◽  
...  

BACKGROUND As a common mental disease, depression seriously affects people’s physical and mental health. According to the statistics of the World Health Organization, depression is one of the main reasons for suicide and self-harm events in the world. Therefore, strengthening depression detection can effectively reduce the occurrence of suicide or self-harm events so as to save more people and families. With the development of computer technology, some researchers are trying to apply natural language processing techniques to detect people who are depressed automatically. Many existing feature engineering methods for depression detection are based on emotional characteristics, but these methods do not consider high-level emotional semantic information. The current deep learning methods for depression detection cannot accurately extract effective emotional semantic information. OBJECTIVE In this paper, we propose an emotion-based attention network, including a semantic understanding network and an emotion understanding network, which can capture the high-level emotional semantic information effectively to improve the depression detection task. METHODS The semantic understanding network module is used to capture the contextual semantic information. The emotion understanding network module is used to capture the emotional semantic information. There are two units in the emotion understanding network module, including a positive emotion understanding unit and a negative emotion understanding unit, which are used to capture the positive emotional information and the negative emotional information, respectively. We further proposed a dynamic fusion strategy in the emotion understanding network module to fuse the positive emotional information and the negative emotional information. RESULTS We evaluated our method on the Reddit data set. The experimental results showed that the proposed emotion-based attention network model achieved an accuracy, precision, recall, and F-measure of 91.30%, 91.91%, 96.15%, and 93.98%, respectively, which are comparable results compared with state-of-the-art methods. CONCLUSIONS The experimental results showed that our model is competitive with the state-of-the-art models. The semantic understanding network module, the emotion understanding network module, and the dynamic fusion strategy are effective modules for depression detection. In addition, the experimental results verified that the emotional semantic information was effective in depression detection.

10.2196/28754 ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. e28754
Author(s):  
Lu Ren ◽  
Hongfei Lin ◽  
Bo Xu ◽  
Shaowu Zhang ◽  
Liang Yang ◽  
...  

Background As a common mental disease, depression seriously affects people’s physical and mental health. According to the statistics of the World Health Organization, depression is one of the main reasons for suicide and self-harm events in the world. Therefore, strengthening depression detection can effectively reduce the occurrence of suicide or self-harm events so as to save more people and families. With the development of computer technology, some researchers are trying to apply natural language processing techniques to detect people who are depressed automatically. Many existing feature engineering methods for depression detection are based on emotional characteristics, but these methods do not consider high-level emotional semantic information. The current deep learning methods for depression detection cannot accurately extract effective emotional semantic information. Objective In this paper, we propose an emotion-based attention network, including a semantic understanding network and an emotion understanding network, which can capture the high-level emotional semantic information effectively to improve the depression detection task. Methods The semantic understanding network module is used to capture the contextual semantic information. The emotion understanding network module is used to capture the emotional semantic information. There are two units in the emotion understanding network module, including a positive emotion understanding unit and a negative emotion understanding unit, which are used to capture the positive emotional information and the negative emotional information, respectively. We further proposed a dynamic fusion strategy in the emotion understanding network module to fuse the positive emotional information and the negative emotional information. Results We evaluated our method on the Reddit data set. The experimental results showed that the proposed emotion-based attention network model achieved an accuracy, precision, recall, and F-measure of 91.30%, 91.91%, 96.15%, and 93.98%, respectively, which are comparable results compared with state-of-the-art methods. Conclusions The experimental results showed that our model is competitive with the state-of-the-art models. The semantic understanding network module, the emotion understanding network module, and the dynamic fusion strategy are effective modules for depression detection. In addition, the experimental results verified that the emotional semantic information was effective in depression detection.


2021 ◽  
Vol 11 (7) ◽  
pp. 3009
Author(s):  
Sungjin Park ◽  
Taesun Whang ◽  
Yeochan Yoon ◽  
Heuiseok Lim

Visual dialog is a challenging vision-language task in which a series of questions visually grounded by a given image are answered. To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e.g., question, dialog history, and image) is required. Specifically, it is necessary for an agent to (1) determine the semantic intent of question and (2) align question-relevant textual and visual contents among heterogeneous modality inputs. In this paper, we propose Multi-View Attention Network (MVAN), which leverages multiple views about heterogeneous inputs based on attention mechanisms. MVAN effectively captures the question-relevant information from the dialog history with two complementary modules (i.e., Topic Aggregation and Context Matching), and builds multimodal representations through sequential alignment processes (i.e., Modality Alignment). Experimental results on VisDial v1.0 dataset show the effectiveness of our proposed model, which outperforms previous state-of-the-art methods under both single model and ensemble settings.


Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2038
Author(s):  
Xi Shao ◽  
Xuan Zhang ◽  
Guijin Tang ◽  
Bingkun Bao

We propose a new end-to-end scene recognition framework, called a Recurrent Memorized Attention Network (RMAN) model, which performs object-based scene classification by recurrently locating and memorizing objects in the image. Based on the proposed framework, we introduce a multi-task mechanism that contiguously attends on the different essential objects in a scene image and recurrently performs memory fusion of the features of object focused by an attention model to improve the scene recognition accuracy. The experimental results show that the RMAN model has achieved better classification performance on the constructed dataset and two public scene datasets, surpassing state-of-the-art image scene recognition approaches.


2019 ◽  
Vol 128 (5) ◽  
pp. 1286-1310 ◽  
Author(s):  
Oscar Mendez ◽  
Simon Hadfield ◽  
Nicolas Pugeault ◽  
Richard Bowden

Abstract The use of human-level semantic information to aid robotic tasks has recently become an important area for both Computer Vision and Robotics. This has been enabled by advances in Deep Learning that allow consistent and robust semantic understanding. Leveraging this semantic vision of the world has allowed human-level understanding to naturally emerge from many different approaches. Particularly, the use of semantic information to aid in localisation and reconstruction has been at the forefront of both fields. Like robots, humans also require the ability to localise within a structure. To aid this, humans have designed high-level semantic maps of our structures called floorplans. We are extremely good at localising in them, even with limited access to the depth information used by robots. This is because we focus on the distribution of semantic elements, rather than geometric ones. Evidence of this is that humans are normally able to localise in a floorplan that has not been scaled properly. In order to grant this ability to robots, it is necessary to use localisation approaches that leverage the same semantic information humans use. In this paper, we present a novel method for semantically enabled global localisation. Our approach relies on the semantic labels present in the floorplan. Deep Learning is leveraged to extract semantic labels from RGB images, which are compared to the floorplan for localisation. While our approach is able to use range measurements if available, we demonstrate that they are unnecessary as we can achieve results comparable to state-of-the-art without them.


2020 ◽  
Vol 34 (05) ◽  
pp. 8496-8503 ◽  
Author(s):  
Chuan Meng ◽  
Pengjie Ren ◽  
Zhumin Chen ◽  
Christof Monz ◽  
Jun Ma ◽  
...  

Existing conversational systems tend to generate generic responses. Recently, Background Based Conversation (BBCs) have been introduced to address this issue. Here, the generated responses are grounded in some background information. The proposed methods for BBCs are able to generate more informative responses, however, they either cannot generate natural responses or have difficulties in locating the right background information. In this paper, we propose a Reference-aware Network (RefNet) to address both issues. Unlike existing methods that generate responses token by token, RefNet incorporates a novel reference decoder that provides an alternative way to learn to directly select a semantic unit (e.g., a span containing complete semantic information) from the background. Experimental results show that RefNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, indicating that RefNet can generate more appropriate and human-like responses.


Author(s):  
Chengyuan Zhang ◽  
Jiayu Song ◽  
Xiaofeng Zhu ◽  
Lei Zhu ◽  
Shichao Zhang

The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.


2021 ◽  
Vol 13 (15) ◽  
pp. 2864
Author(s):  
Shitong Du ◽  
Yifan Li ◽  
Xuyou Li ◽  
Menghao Wu

Simultaneous Localization and Mapping (SLAM) in an unknown environment is a crucial part for intelligent mobile robots to achieve high-level navigation and interaction tasks. As one of the typical LiDAR-based SLAM algorithms, the Lidar Odometry and Mapping in Real-time (LOAM) algorithm has shown impressive results. However, LOAM only uses low-level geometric features without considering semantic information. Moreover, the lack of a dynamic object removal strategy limits the algorithm to obtain higher accuracy. To this end, this paper extends the LOAM pipeline by integrating semantic information into the original framework. Specifically, we first propose a two-step dynamic objects filtering strategy. Point-wise semantic labels are then used to improve feature extraction and searching for corresponding points. We evaluate the performance of the proposed method in many challenging scenarios, including highway, country and urban from the KITTI dataset. The results demonstrate that the proposed SLAM system outperforms the state-of-the-art SLAM methods in terms of accuracy and robustness.


Author(s):  
Zhihao Fan ◽  
Zhongyu Wei ◽  
Siyuan Wang ◽  
Ruize Wang ◽  
Zejun Li ◽  
...  

Existing research for image captioning usually represents an image using a scene graph with low-level facts (objects and relations) and fails to capture the high-level semantics. In this paper, we propose a Theme Concepts extended Image Captioning (TCIC) framework that incorporates theme concepts to represent high-level cross-modality semantics. In practice, we model theme concepts as memory vectors and propose Transformer with Theme Nodes (TTN) to incorporate those vectors for image captioning. Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN. On the vision side, TTN is configured to take both scene graph based features and theme concepts as input for visual representation learning. On the language side, TTN is configured to take both captions and theme concepts as input for text representation re-construction. Both settings aim to generate target captions with the same transformer-based decoder. During the training, we further align representations of theme concepts learned from images and corresponding captions to enforce the cross-modality learning. Experimental results on MS COCO show the effectiveness of our approach compared to some state-of-the-art models.


Author(s):  
Weixuan Wang ◽  
Zhihong Chen ◽  
Haifeng Hu

Recently, attention mechanism has been successfully applied in image captioning, but the existing attention methods are only established on low-level spatial features or high-level text features, which limits richness of captions. In this paper, we propose a Hierarchical Attention Network (HAN) that enables attention to be calculated on pyramidal hierarchy of features synchronously. The pyramidal hierarchy consists of features on diverse semantic levels, which allows predicting different words according to different features. On the other hand, due to the different modalities of features, a Multivariate Residual Module (MRM) is proposed to learn the joint representations from features. The MRM is able to model projections and extract relevant relations among different features. Furthermore, we introduce a context gate to balance the contribution of different features. Compared with the existing methods, our approach applies hierarchical features and exploits several multimodal integration strategies, which can significantly improve the performance. The HAN is verified on benchmark MSCOCO dataset, and the experimental results indicate that our model outperforms the state-of-the-art methods, achieving a BLEU1 score of 80.9 and a CIDEr score of 121.7 in the Karpathy’s test split.


Sign in / Sign up

Export Citation Format

Share Document