scholarly journals Image Captioning with multi-level similarity-guided semantic matching

Author(s):  
Jiesi Li ◽  
Ning Xu ◽  
Weizhi Nie ◽  
Shenyuan Zhang
2020 ◽  
Vol 22 (5) ◽  
pp. 1372-1383 ◽  
Author(s):  
Ning Xu ◽  
Hanwang Zhang ◽  
An-An Liu ◽  
Weizhi Nie ◽  
Yuting Su ◽  
...  

Symmetry ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1184
Author(s):  
Peng Tian ◽  
Hongwei Mo ◽  
Laihao Jiang

Object detection, visual relationship detection, and image captioning, which are the three main visual tasks in scene understanding, are highly correlated and correspond to different semantic levels of scene image. However, the existing captioning methods convert the extracted image features into description text, and the obtained results are not satisfactory. In this work, we propose a Multi-level Semantic Context Information (MSCI) network with an overall symmetrical structure to leverage the mutual connections across the three different semantic layers and extract the context information between them, to solve jointly the three vision tasks for achieving the accurate and comprehensive description of the scene image. The model uses a feature refining structure to mutual connections and iteratively updates the different semantic features of the image. Then a context information extraction network is used to extract the context information between the three different semantic layers, and an attention mechanism is introduced to improve the accuracy of image captioning while using the context information between the different semantic layers to improve the accuracy of object detection and relationship detection. Experiments on the VRD and COCO datasets demonstrate that our proposed model can leverage the context information between semantic layers to improve the accuracy of those visual tasks generation.


Author(s):  
Anan Liu ◽  
Ning Xu ◽  
Hanwang Zhang ◽  
Weizhi Nie ◽  
Yuting Su ◽  
...  

Image captioning is one of the most challenging hallmark of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flickr30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 2608-2620
Author(s):  
Zhenghang Yuan ◽  
Xuelong Li ◽  
Qi Wang

2020 ◽  
Vol 12 (6) ◽  
pp. 939 ◽  
Author(s):  
Yangyang Li ◽  
Shuangkang Fang ◽  
Licheng Jiao ◽  
Ruijiao Liu ◽  
Ronghua Shang

The task of image captioning involves the generation of a sentence that can describe an image appropriately, which is the intersection of computer vision and natural language. Although the research on remote sensing image captions has just started, it has great significance. The attention mechanism is inspired by the way humans think, which is widely used in remote sensing image caption tasks. However, the attention mechanism currently used in this task is mainly aimed at images, which is too simple to express such a complex task well. Therefore, in this paper, we propose a multi-level attention model, which is a closer imitation of attention mechanisms of human beings. This model contains three attention structures, which represent the attention to different areas of the image, the attention to different words, and the attention to vision and semantics. Experiments show that our model has achieved better results than before, which is currently state-of-the-art. In addition, the existing datasets for remote sensing image captioning contain a large number of errors. Therefore, in this paper, a lot of work has been done to modify the existing datasets in order to promote the research of remote sensing image captioning.


2011 ◽  
Vol 130-134 ◽  
pp. 313-316
Author(s):  
Fan Zhang ◽  
Zu De Zhou ◽  
Xiao Jie Liu

CBR (Case-based reasoning) theory is applied to the automobile quality fault diagnosis field. Case description, case retrieve and case reuse are the main factors of CBR. It is realized by the methods of fault tree construction, discrete cases information entropy weight value calculation, discrete and multi-level semantic matching case retrieval and knowledge difference driving case reuse. The process of solving actual cases verify the validity and efficiency of CBR method in the quality fault diagnosis system.


Author(s):  
Dongming Zhou ◽  
Canlong Zhang ◽  
Zhixin Li ◽  
Zhiwen Wang
Keyword(s):  

Author(s):  
Ferdinand Keller ◽  
Tatjana Stadnitski ◽  
Jakob Nützel ◽  
Renate Schepker
Keyword(s):  

Zusammenfassung. Fragestellung: Über Veränderungen in der emotionalen Befindlichkeit von Jugendlichen während einer Suchttherapie ist wenig bekannt. Methode: Die Jugendlichen füllten wöchentlich einen entsprechenden Fragebogen aus, analog ihre Bezugsbetreuer eine parallelisierte Kurzfassung. Von 42 Jugendlichen liegen insgesamt 853 Bogen und von den Bezugsbetreuern 708 Bogen vor. Die Fragebogen wurden zunächst faktorenanalytisch hinsichtlich ihrer Dimensionalität ausgewertet, anschließend wurden gruppenbezogene Verlaufsanalysen (Multi-Level-Modelle) und Abhängigkeitsanalysen auf Einzelfallebene (Zeitreihenanalysen) durchgeführt. Ergebnisse: Im Jugendlichenfragebogen ergaben sich vier Faktoren: negative Befindlichkeit, Wertschätzung von Therapie/Betreuung, Motivation und Suchtdynamik. Die Übereinstimmung zwischen den Jugendlichen- und der (einfaktoriellen) Betreuereinschätzung fiel insgesamt niedrig bis mäßig aus, brachte aber auf Einzelfallebene differenziertere Ergebnisse. Im Verlauf nahmen die Werte auf allen vier Jugendlichenskalen ab. Einzig der Verlauf der Wertschätzung in der Eingewöhnungsphase war prädiktiv für den späteren Abbruch der Maßnahme: Bei den Abbrechern nahm die Wertschätzung ab, während sie bei den Beendern initial stieg. Schlussfolgerungen: Der bedeutsamste Faktor in Bezug auf die Therapiebeendigung suchtkranker Jugendlicher scheint die Wertschätzung von Therapie/Betreuung zu sein, während die Motivation jugendtypische Schwankungen aufweist. Der Suchtdynamik kam eine deutlich weniger bedeutende Rolle zu als allgemein angenommen. Programme in der Langzeittherapie sollten die Wertschätzung von Therapie/Betreuung künftig mehr fokussieren als die Suchtdynamik.


Sign in / Sign up

Export Citation Format

Share Document