visual scene understanding
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 5)

H-INDEX

3
(FIVE YEARS 1)

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Yuezhong Wu ◽  
Xuehao Shen ◽  
Qiang Liu ◽  
Falong Xiao ◽  
Changyun Li

Garbage classification is a social issue related to people’s livelihood and sustainable development, so letting service robots autonomously perform intelligent garbage classification has important research significance. Aiming at the problems of complex systems with data source and cloud service center data transmission delay and untimely response, at the same time, in order to realize the perception, storage, and analysis of massive multisource heterogeneous data, a garbage detection and classification method based on visual scene understanding is proposed. This method uses knowledge graphs to store and model items in the scene in the form of images, videos, texts, and other multimodal forms. The ESA attention mechanism is added to the backbone network part of the YOLOv5 network, aiming to improve the feature extraction ability of the network, combining with the built multimodal knowledge graph to form the YOLOv5-Attention-KG model, and deploying it to the service robot to perform real-time perception on the items in the scene. Finally, collaborative training is carried out on the cloud server side and deployed to the edge device side to reason and analyze the data in real time. The test results show that, compared with the original YOLOv5 model, the detection and classification accuracy of the proposed model is higher, and the real-time performance can also meet the actual use requirements. The model proposed in this paper can realize the intelligent decision-making of garbage classification for big data in the scene in a complex system and has certain conditions for promotion and landing.


2018 ◽  
Vol 45 (12) ◽  
pp. 1279-1286
Author(s):  
Donghyeop Shin ◽  
Incheol Kim

2018 ◽  
Vol 24 (3) ◽  
pp. 325-362
Author(s):  
A. BELZ ◽  
T.L. BERG ◽  
L. YU

Work in computer vision and natural language processing involving images and text has been experiencing explosive growth over the past decade, with a particular boost coming from the neural network revolution. The present volume brings together five research articles from several different corners of the area: multilingual multimodal image description (Franket al.), multimodal machine translation (Madhyasthaet al., Franket al.), image caption generation (Madhyasthaet al., Tantiet al.), visual scene understanding (Silbereret al.), and multimodal learning of high-level attributes (Sorodocet al.). In this article, we touch upon all of these topics as we review work involving images and text under the three main headings of image description (Section 2), visually grounded referring expression generation (REG) and comprehension (Section 3), and visual question answering (VQA) (Section 4).


Sign in / Sign up

Export Citation Format

Share Document