Automatic RGBD Object Segmentation Based on MSRM Framework Integrating Depth Value

In this paper, an automatic RGBD object segmentation method is described. The method integrates depth feature with the cues from RGB images and then uses maximal similarity based region merging (MSRM) method to obtain the segmentation results. Firstly, the depth information is fused to the simple linear iterative clustering (SLIC) method so as to produce superpixels whose boundaries are well adhered to the edges of the natural image. Meanwhile, the depth prior is also incorporated into the saliency estimation, which helps a more accurate localization of representative object and background seeds. By introducing the depth cue into the region merging rule, the maximal geometry weighted similarity (MGWS) is considered, and the resulting segmentation framework has the ability to handle the complex image with similar colour appearance between object and background. Extensive experiments on public RGBD image datasets show that our proposed approach can reliably and automatically provide very promising segmentation results.

Download Full-text

Intra-object segmentation using depth information

2018 19th IEEE Mediterranean Electrotechnical Conference (MELECON) ◽

10.1109/melcon.2018.8379063 ◽

2018 ◽

Author(s):

Dylan Seychell ◽

Carl James Debono

Keyword(s):

Object Segmentation ◽

Depth Information

Download Full-text

Object Extraction Using Topological Models from Complex Scene Image

Advances in Multimedia and Interactive Technologies - Handbook of Research on Advanced Concepts in Real-Time Image and Video Processing ◽

10.4018/978-1-5225-2848-7.ch013 ◽

2017 ◽

pp. 335-357

Author(s):

Uday Pratap Singh ◽

Sanjeev Jain

Keyword(s):

Object Recognition ◽

Object Segmentation ◽

Multimedia Data ◽

Region Merging ◽

Complex Scene ◽

Object Contour ◽

Merging Process ◽

Feasible Solutions ◽

Neighbourhood Graph ◽

Topological Models

Efficient and effective object recognition from a multimedia data are very complex. Automatic object segmentation is usually very hard for natural images; interactive schemes with a few simple markers provide feasible solutions. In this chapter, we propose topological model based region merging. In this work, we will focus on topological models like, Relative Neighbourhood Graph (RNG) and Gabriel graph (GG), etc. From the Initial segmented image, we constructed a neighbourhood graph represented different regions as the node of graph and weight of the edges are the value of dissimilarity measures function for their colour histogram vectors. A method of similarity based region merging mechanism (supervised and unsupervised) is proposed to guide the merging process with the help of markers. The region merging process is adaptive to the image content and it does not need to set the similarity threshold in advance. To the validation of proposed method extensive experiments are performed and the result shows that the proposed method extracts the object contour from the complex background.

Download Full-text

On the Stereoscopic Composition of Wide Baseline Stereo Pairs

Volume 2: 29th Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2009-86357 ◽

2009 ◽

Author(s):

Chensheng Wang ◽

Xiaochun Wang ◽

Joris S. M. Vergeest ◽

Tjamme Wiegers

Keyword(s):

Binocular Vision ◽

Main Idea ◽

Depth Information ◽

Stereoscopic Image ◽

Stereoscopic Images ◽

Wide Baseline Stereo ◽

Before And After ◽

Depth Cue ◽

A Minor

Wide baseline cameras are broadly utilized in binocular vision systems, delivering depth information and stereoscopic images of the scene that are crucial both in virtual reality and in computer vision applications. However, due to the large distance between the two cameras, the stereoscopic composition of stereo pairs with wide baseline is hardly to fit the human eye parallax. In this paper, techniques and algorithms for the stereoscopic composition of wide baseline stereo pairs in binocular vision will be investigated. By incorporating the human parallax limitation, a novel algorithm being capable of adjusting the wide baseline stereo pairs to compose a high quality stereoscopic image will be formulated. The main idea behind the proposed algorithm is, by simulating the eyeball rotation, to shift the wide baseline stereo pairs closer to each other to fit the human parallax limit. This makes it possible for the wide baseline stereo pairs to be composed into a recognizable stereoscopic image in terms of human parallax with a minor cost of variation in the depth cue. In addition, the depth variations before and after the shifting of the stereo pairs are evaluated by conducting an error estimation. Examples are provided for the evaluation of the proposed algorithm. And the quality of the composed stereoscopic images proves that the proposed algorithm is both valid and effective.

Download Full-text

Real-Time Fruit Recognition and Grasping Estimation for Robotic Apple Harvesting

Sensors ◽

10.3390/s20195670 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5670

Author(s):

Hanwen Kang ◽

Hongyu Zhou ◽

Xing Wang ◽

Chao Chen

Keyword(s):

Image Data ◽

Point Clouds ◽

Robotic Arm ◽

Depth Information ◽

Agricultural Industry ◽

Weight One ◽

Rgb Images ◽

Robotic Harvesting ◽

Instance Segmentation ◽

Indoor And Outdoor

Robotic harvesting shows a promising aspect in future development of agricultural industry. However, there are many challenges which are still presented in the development of a fully functional robotic harvesting system. Vision is one of the most important keys among these challenges. Traditional vision methods always suffer from defects in accuracy, robustness, and efficiency in real implementation environments. In this work, a fully deep learning-based vision method for autonomous apple harvesting is developed and evaluated. The developed method includes a light-weight one-stage detection and segmentation network for fruit recognition and a PointNet to process the point clouds and estimate a proper approach pose for each fruit before grasping. Fruit recognition network takes raw inputs from RGB-D camera and performs fruit detection and instance segmentation on RGB images. The PointNet grasping network combines depth information and results from the fruit recognition as input and outputs the approach pose of each fruit for robotic arm execution. The developed vision method is evaluated on RGB-D image data which are collected from both laboratory and orchard environments. Robotic harvesting experiments in both indoor and outdoor conditions are also included to validate the performance of the developed harvesting system. Experimental results show that the developed vision method can perform highly efficient and accurate to guide robotic harvesting. Overall, the developed robotic harvesting system achieves 0.8 on harvesting success rate and cycle time is 6.5 s.

Download Full-text

SeDAR: Reading Floorplans Like a Human—Using Deep Learning to Enable Human-Inspired Localisation

International Journal of Computer Vision ◽

10.1007/s11263-019-01239-4 ◽

2019 ◽

Vol 128 (5) ◽

pp. 1286-1310 ◽

Cited By ~ 3

Author(s):

Oscar Mendez ◽

Simon Hadfield ◽

Nicolas Pugeault ◽

Richard Bowden

Keyword(s):

Deep Learning ◽

Semantic Information ◽

State Of The Art ◽

Depth Information ◽

Semantic Maps ◽

Novel Method ◽

Rgb Images ◽

High Level ◽

Robotic Tasks ◽

And Robotics

Abstract The use of human-level semantic information to aid robotic tasks has recently become an important area for both Computer Vision and Robotics. This has been enabled by advances in Deep Learning that allow consistent and robust semantic understanding. Leveraging this semantic vision of the world has allowed human-level understanding to naturally emerge from many different approaches. Particularly, the use of semantic information to aid in localisation and reconstruction has been at the forefront of both fields. Like robots, humans also require the ability to localise within a structure. To aid this, humans have designed high-level semantic maps of our structures called floorplans. We are extremely good at localising in them, even with limited access to the depth information used by robots. This is because we focus on the distribution of semantic elements, rather than geometric ones. Evidence of this is that humans are normally able to localise in a floorplan that has not been scaled properly. In order to grant this ability to robots, it is necessary to use localisation approaches that leverage the same semantic information humans use. In this paper, we present a novel method for semantically enabled global localisation. Our approach relies on the semantic labels present in the floorplan. Deep Learning is leveraged to extract semantic labels from RGB images, which are compared to the floorplan for localisation. While our approach is able to use range measurements if available, we demonstrate that they are unnecessary as we can achieve results comparable to state-of-the-art without them.

Download Full-text

A Multi-Level Approach to Waste Object Segmentation

Sensors ◽

10.3390/s20143816 ◽

2020 ◽

Vol 20 (14) ◽

pp. 3816

Author(s):

Tao Wang ◽

Yuanzheng Cai ◽

Lingyu Liang ◽

Dongyi Ye

Keyword(s):

Random Field ◽

Object Segmentation ◽

Color Image ◽

Conditional Random Field ◽

Depth Image ◽

Future Research ◽

Depth Information ◽

Multi Level ◽

Multiple Levels ◽

Potential Object

We address the problem of localizing waste objects from a color image and an optional depth image, which is a key perception component for robotic interaction with such objects. Specifically, our method integrates the intensity and depth information at multiple levels of spatial granularity. Firstly, a scene-level deep network produces an initial coarse segmentation, based on which we select a few potential object regions to zoom in and perform fine segmentation. The results of the above steps are further integrated into a densely connected conditional random field that learns to respect the appearance, depth, and spatial affinities with pixel-level accuracy. In addition, we create a new RGBD waste object segmentation dataset, MJU-Waste, that is made public to facilitate future research in this area. The efficacy of our method is validated on both MJU-Waste and the Trash Annotation in Context (TACO) dataset.

Download Full-text

The Perception of Distance in Simulated Visual Displays:A Comparison of the Effectiveness and Accuracy of Multiple Depth Cues Across Viewing Distances

Presence Teleoperators & Virtual Environments ◽

10.1162/pres.1997.6.5.513 ◽

1997 ◽

Vol 6 (5) ◽

pp. 513-531 ◽

Cited By ~ 48

Author(s):

R. Troy Surdick ◽

Elizabeth T. Davis ◽

Robert A. King ◽

Larry F. Hodges

Keyword(s):

Relative Size ◽

Linear Perspective ◽

Depth Information ◽

Texture Gradient ◽

Viewing Distance ◽

Relative Height ◽

Distance Information ◽

Depth Cues ◽

Relative Brightness ◽

Depth Cue

The ability effectively and accurately to simulate distance in virtual and augmented reality systems is a challenge currently facing R&D. To examine this issue, we separately tested each of seven visual depth cues (relative brightness, relative size, relative height, linear perspective, foreshortening, texture gradient, and stereopsis) as well as the condition in which all seven of these cues were present and simultaneously providing distance information in a simulated display. The viewing distances were 1 and 2 m. In developing simulated displays to convey distance and depth there are three questions that arise. First, which cues provide effective depth information (so that only a small change in the depth cue results in a perceived change in depth)? Second, which cues provide accurate depth information (so that the perceived distance of two equidistant objects perceptually matches)? Finally, how does the effectiveness and accuracy of these depth cues change as a function of the viewing distance? Ten college-aged subjects were tested with each depth-cue condition at both viewing distances. They were tested using a method of constant stimuli procedure and a modified Wheat-stone stereoscopic display. The perspective cues (linear perspective, foreshortening, and texture gradient) were found to be more effective than other depth cues, while effectiveness of relative brightness was vastly inferior. Moreover, relative brightness, relative height, and relative size all significantly decreased in effectiveness with an increase in viewing distance. The depth cues did not differ in terms of accuracy at either viewing distance. Finally, some subjects experienced difficulty in rapidly perceiving distance information provided by stereopsis, but no subjects had difficulty in effectively and accurately perceiving distance with the perspective information used in our experiment. A second experiment demonstrated that a previously stereo-anomalous subject could be trained to perceive stereoscopic depth in a binocular display. We conclude that the use of perspective cues in simulated displays may be more important than the other depth cues tested because these cues are the most effective and accurate cues at both viewing distances, can be easily perceived by all subjects, and can be readily incorporated into simpler, less complex displays (e.g., biocular HMDs) or more complex ones (e.g., binocular or see-through HMDs).

Download Full-text