Modelling Salient Object-Object Interactions to Generate Textual Descriptions for Natural Images

Author(s):  
Hossein Adeli ◽  
Babak Yadranjiaghdam ◽  
Nathan Pool ◽  
Nasseh Tabrizi
2015 ◽  
Vol 2015 ◽  
pp. 1-12
Author(s):  
Shangbing Gao ◽  
Yunyang Yan ◽  
Youdong Zhang ◽  
Jingbo Zhou ◽  
Suqun Cao ◽  
...  

Natural image segmentation is often a crucial first step for high-level image understanding, significantly reducing the complexity of content analysis of images. LRAC may have some disadvantages. (1) Segmentation results heavily depend on the initial contour selection which is a very skillful task. (2) In some situations, manual interactions are infeasible. To overcome these shortcomings, we propose a novel model for unsupervised segmentation of viewer’s attention object from natural images based on localizing region-based active model (LRAC). With aid of the color boosting Harris detector and the core saliency map, we get the salient object edge points. Then, these points are employed as the seeds of initial convex hull. Finally, this convex hull is improved by the edge-preserving filter to generate the initial contour for our automatic object segmentation system. In contrast with localizing region-based active contours that require considerable user interaction, the proposed method does not require it; that is, the segmentation task is fulfilled in a fully automatic manner. Extensive experiments results on a large variety of natural images demonstrate that our algorithm consistently outperforms the popular existing salient object segmentation methods, yielding higher precision and better recall rates. Our framework can reliably and automatically extract the object contour from the complex background.


2020 ◽  
Vol 14 (10) ◽  
pp. 2249-2262
Author(s):  
Gökhan Yildirim ◽  
Debashis Sen ◽  
Mohan Kankanhalli ◽  
Sabine Süsstrunk

Author(s):  
Yuki HAYAMI ◽  
Daiki TAKASU ◽  
Hisakazu AOYANAGI ◽  
Hiroaki TAKAMATSU ◽  
Yoshifumi SHIMODAIRA ◽  
...  

Author(s):  
M. N. Favorskaya ◽  
L. C. Jain

Introduction:Saliency detection is a fundamental task of computer vision. Its ultimate aim is to localize the objects of interest that grab human visual attention with respect to the rest of the image. A great variety of saliency models based on different approaches was developed since 1990s. In recent years, the saliency detection has become one of actively studied topic in the theory of Convolutional Neural Network (CNN). Many original decisions using CNNs were proposed for salient object detection and, even, event detection.Purpose:A detailed survey of saliency detection methods in deep learning era allows to understand the current possibilities of CNN approach for visual analysis conducted by the human eyes’ tracking and digital image processing.Results:A survey reflects the recent advances in saliency detection using CNNs. Different models available in literature, such as static and dynamic 2D CNNs for salient object detection and 3D CNNs for salient event detection are discussed in the chronological order. It is worth noting that automatic salient event detection in durable videos became possible using the recently appeared 3D CNN combining with 2D CNN for salient audio detection. Also in this article, we have presented a short description of public image and video datasets with annotated salient objects or events, as well as the often used metrics for the results’ evaluation.Practical relevance:This survey is considered as a contribution in the study of rapidly developed deep learning methods with respect to the saliency detection in the images and videos.


2021 ◽  
Vol 40 (3) ◽  
pp. 1-12
Author(s):  
Hao Zhang ◽  
Yuxiao Zhou ◽  
Yifei Tian ◽  
Jun-Hai Yong ◽  
Feng Xu

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras.


Sign in / Sign up

Export Citation Format

Share Document