scholarly journals From Single 2D Depth Image to Gripper 6D Pose Estimation: A Fast and Robust Algorithm for Grabbing Objects in Cluttered Scenes

Robotics ◽  
2019 ◽  
Vol 8 (3) ◽  
pp. 63 ◽  
Author(s):  
Amirhossein Jabalameli ◽  
Aman Behal

In this paper, we investigate the problem of grasping previously unseen objects in unstructured environments which are cluttered with multiple objects. Object geometry, reachability, and force-closure analysis are considered to address this problem. A framework is proposed for grasping unknown objects by localizing contact regions on the contours formed by a set of depth edges generated from a single-view 2D depth image. Specifically, contact regions are determined based on edge geometric features derived from analysis of the depth map data. Finally, the performance of the approach is successfully validated by applying it to scenes with both single and multiple objects, in both simulation and experiments. Using sequential processing in MATLAB running on a 4th-generation Intel Core Desktop, simulation results with the benchmark Object Segmentation Database show that the algorithm takes 281 ms on average to generate the 6D robot pose needed to attach with a pair of viable grasping edges that satisfy reachability and force-closure conditions. Experimental results in the Assistive Robotics Laboratory at UCF using a Kinect One sensor and a Baxter manipulator outfitted with a standard parallel gripper showcase the feasibility of the approach in grasping previously unseen objects from uncontrived multi-object settings.

Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 546
Author(s):  
Zhenni Li ◽  
Haoyi Sun ◽  
Yuliang Gao ◽  
Jiao Wang

Depth maps obtained through sensors are often unsatisfactory because of their low-resolution and noise interference. In this paper, we propose a real-time depth map enhancement system based on a residual network which uses dual channels to process depth maps and intensity maps respectively and cancels the preprocessing process, and the algorithm proposed can achieve real-time processing speed at more than 30 fps. Furthermore, the FPGA design and implementation for depth sensing is also introduced. In this FPGA design, intensity image and depth image are captured by the dual-camera synchronous acquisition system as the input of neural network. Experiments on various depth map restoration shows our algorithms has better performance than existing LRMC, DE-CNN and DDTF algorithms on standard datasets and has a better depth map super-resolution, and our FPGA completed the test of the system to ensure that the data throughput of the USB 3.0 interface of the acquisition system is stable at 226 Mbps, and support dual-camera to work at full speed, that is, 54 fps@ (1280 × 960 + 328 × 248 × 3).


2019 ◽  
Vol 11 (10) ◽  
pp. 204 ◽  
Author(s):  
Dogan ◽  
Haddad ◽  
Ekmekcioglu ◽  
Kondoz

When it comes to evaluating perceptual quality of digital media for overall quality of experience assessment in immersive video applications, typically two main approaches stand out: Subjective and objective quality evaluation. On one hand, subjective quality evaluation offers the best representation of perceived video quality assessed by the real viewers. On the other hand, it consumes a significant amount of time and effort, due to the involvement of real users with lengthy and laborious assessment procedures. Thus, it is essential that an objective quality evaluation model is developed. The speed-up advantage offered by an objective quality evaluation model, which can predict the quality of rendered virtual views based on the depth maps used in the rendering process, allows for faster quality assessments for immersive video applications. This is particularly important given the lack of a suitable reference or ground truth for comparing the available depth maps, especially when live content services are offered in those applications. This paper presents a no-reference depth map quality evaluation model based on a proposed depth map edge confidence measurement technique to assist with accurately estimating the quality of rendered (virtual) views in immersive multi-view video content. The model is applied for depth image-based rendering in multi-view video format, providing comparable evaluation results to those existing in the literature, and often exceeding their performance.


2001 ◽  
Vol 41 (1) ◽  
pp. 429
Author(s):  
R.J.W. Bunt ◽  
W.D. Powell ◽  
T. Scholefield

Difficulties in defining the structural character of the reservoir horizons at the Tubridgi Gas Field arise from gas charging of thin, often laterally discontinuous, silts and sands within the overburden. The gas charging of these shallow, low permeability units results in a seismic representation of the field as a time low. Historically, conversion from time to a reliable depth image has been problematic due to the variable nature of the gas charging, the relatively sparse, multi-vintage 2D seismic coverage and the corresponding difficulties in defining an accurate velocity field.After the unsuccessful drilling program in 1997 when three out of the five wells were plugged and abandoned, a revised interpretation methodology was developed, incorporating all available geophysical data, but placing a much greater emphasis on geological information from each of the wells in the area.The new depth map and geological model were tested by the drilling of Tubridgi–16 to –18 in August 1999. These three wells intersected the Birdrong Sandstone within one metre of prognosis, with two wells located structurally up-dip of the previous 17 wells drilled on the field. This accuracy resulted in a 97% increase in remaining reserves and a much higher level of confidence in the structural configuration of the Tubridgi field.A core of the Lower Gearle Sandstone in the Tubridgi 18 well highlighted the potential of this zone which has subsequently been evaluated in greater detail and potentially represents an additional productive horizon for the field.


2019 ◽  
Vol 5 (9) ◽  
pp. 73 ◽  
Author(s):  
Wen-Nung Lie ◽  
Chia-Che Ho

In this paper, a multi-focus image stack captured by varying positions of the imaging plane is processed to synthesize an all-in-focus (AIF) image and estimate its corresponding depth map. Compared with traditional methods (e.g., pixel- and block-based techniques), our focus-based measures are calculated based on irregularly shaped regions that have been refined or split in an iterative manner, to adapt to different image contents. An initial all-focus image is first computed, which is then segmented to get a region map. Spatial-focal property for each region is then analyzed to determine whether a region should be iteratively split into sub-regions. After iterative splitting, the final region map is used to perform regionally best focusing, based on the Winner-take-all (WTA) strategy, i.e., choosing the best focused pixels from image stack. The depth image can be easily converted from the resulting label image, where the label for each pixel represents the image index from which the pixel with the best focus is chosen. Regions whose focus profiles are not confident in getting a winner of the best focus will resort to spatial propagation from neighboring confident regions. Our experiments show that the adaptive region-splitting algorithm outperforms other state-of-the-art methods or commercial software in synthesis quality (in terms of a well-known Q metric), depth maps (in terms of subjective quality), and processing speed (with a gain of 17.81~40.43%).


2020 ◽  
Vol 43 (1) ◽  
pp. 59-78 ◽  
Author(s):  
David Johnson ◽  
Daniela Damian ◽  
George Tzanetakis

We present research for automatic assessment of pianist hand posture that is intended to help beginning piano students improve their piano-playing technique during practice sessions. To automatically assess a student's hand posture, we propose a system that is able to recognize three categories of postures from a single depth map containing a pianist's hands during performance. This is achieved through a computer vision pipeline that uses machine learning on the depth maps for both hand segmentation and detection of hand posture. First, we segment the left and right hands from the scene captured in the depth map using per-pixel classification. To train the hand-segmentation models, we experiment with two feature descriptors, depth image features and depth context features, that describe the context of individual pixels' neighborhoods. After the hands have been segmented from the depth map, a posture-detection model classifies each hand as one of three possible posture categories: correct posture, low wrists, or flat hands. Two methods are tested for extracting descriptors from the segmented hands, histograms of oriented gradients and histograms of normal vectors. To account for variation in hand size and practice space, detection models are individually built for each student using support vector machines with the extracted descriptors. We validate this approach using a data set that was collected by recording four beginning piano students while performing standard practice exercises. The results presented in this article show the effectiveness of this approach, with depth context features and histograms of normal vectors performing the best.


2014 ◽  
Vol 1006-1007 ◽  
pp. 797-801
Author(s):  
Ying Sun ◽  
Guang Lin Gao

The depth map is a basic diagram of the intrinsic; each pixel value represents the scene graph the elevation position of the object point. In this paper, the analysis methods for target classification elevation map. Figure elevation are visible depth image, the depth of the image is the distance from each point in the scene to the image capture device values ​​of the image as an image pixel value.


2015 ◽  
Vol 35 (8) ◽  
pp. 959-976 ◽  
Author(s):  
Marek Kopicki ◽  
Renaud Detry ◽  
Maxime Adjigble ◽  
Rustam Stolkin ◽  
Ales Leonardis ◽  
...  

This paper presents a method for one-shot learning of dexterous grasps and grasp generation for novel objects. A model of each grasp type is learned from a single kinesthetic demonstration and several types are taught. These models are used to select and generate grasps for unfamiliar objects. Both the learning and generation stages use an incomplete point cloud from a depth camera, so no prior model of an object shape is used. The learned model is a product of experts, in which experts are of two types. The first type is a contact model and is a density over the pose of a single hand link relative to the local object surface. The second type is the hand-configuration model and is a density over the whole-hand configuration. Grasp generation for an unfamiliar object optimizes the product of these two model types, generating thousands of grasp candidates in under 30 seconds. The method is robust to incomplete data at both training and testing stages. When several grasp types are considered the method selects the highest-likelihood grasp across all the types. In an experiment, the training set consisted of five different grasps and the test set of 45 previously unseen objects. The success rate of the first-choice grasp is 84.4% or 77.7% if seven views or a single view of the test object are taken, respectively.


2003 ◽  
Vol 125 (2) ◽  
pp. 325-332 ◽  
Author(s):  
Michael Yu Wang ◽  
Diana M. Pelinescu

Analysis and characterization of workpiece-fixture contact forces are important in fixture design since they define the fixture stability during clamping and strongly influence workpiece accuracy during manufacturing. This paper presents a method for predicting and analyzing the normal and frictional contact forces between workpiece-fixture contacts. The fixture and workpiece are considered to be rigid bodies, and the model solution is solved as a constrained quadratic optimization by applying the minimum norm principle. The model reveals some intricate properties of the passive contact forces, including the potential of a locator release and the history dependency during a sequence of clamping and/or external force loading. Further, a notion of passive force closure is considered to characterize the passive nature of the fixture forces. Geometric conditions for two types of passive force closure (concordant and discordant closures) are provided, showing a complication of released locator under clamping with a limited role in force closure. Model predictions are shown to be in good agreement with known results of an elastic-contact model prediction and experimental measurements. The passive force closure conditions are illustrated with examples. This presented method is conceptually simple and computationally efficient. It is particularly useful in the early stages of fixture design and process planning.


Sign in / Sign up

Export Citation Format

Share Document