Human Point Cloud Inpainting for Industrial Human-Robot Collaboration Using Deep Generative Model

Author(s):  
Le Wang ◽  
Shengquan Xie ◽  
Wenjun Xu ◽  
Bitao Yao ◽  
Jia Cui ◽  
...  

Abstract In complex industrial human-robot collaboration (HRC) environment, obstacles in the shared working space will occlude the operator, and the industrial robot will threaten the safety of the operator if it is unable to get the complete human spatial point cloud. This paper proposes a real-time human point cloud inpainting method based on the deep generative model. The method can recover the human point cloud occluded by obstacles in the shared working space to ensure the safety of the operator. The method proposed in this paper can be mainly divided into three parts: (i) real-time obstacles detection. This process can detect obstacle locations in real time and generate the image of obstacles. (ii) the application of the deep generative model algorithm. It is a complete convolutional neural network (CNN) structure and introduces advanced generative adversarial loss. The model can generate the missing depth data of operators at arbitrary position in the human depth image. (iii) spatial mapping of the depth image. The depth image will be mapped to point cloud by coordinate system conversion. The effectiveness of the method is verified by filling hole of the human point cloud occluded by obstacles in industrial HRC environment. The experiment results show that the proposed method can accurately generate the occluded human point cloud in real time and ensure the safety of the operator.

Author(s):  
Haodong Chen ◽  
Ming C. Leu ◽  
Wenjin Tao ◽  
Zhaozheng Yin

Abstract With the development of industrial automation and artificial intelligence, robotic systems are developing into an essential part of factory production, and the human-robot collaboration (HRC) becomes a new trend in the industrial field. In our previous work, ten dynamic gestures have been designed for communication between a human worker and a robot in manufacturing scenarios, and a dynamic gesture recognition model based on Convolutional Neural Networks (CNN) has been developed. Based on the model, this study aims to design and develop a new real-time HRC system based on multi-threading method and the CNN. This system enables the real-time interaction between a human worker and a robotic arm based on dynamic gestures. Firstly, a multi-threading architecture is constructed for high-speed operation and fast response while schedule more than one task at the same time. Next, A real-time dynamic gesture recognition algorithm is developed, where a human worker’s behavior and motion are continuously monitored and captured, and motion history images (MHIs) are generated in real-time. The generation of the MHIs and their identification using the classification model are synchronously accomplished. If a designated dynamic gesture is detected, it is immediately transmitted to the robotic arm to conduct a real-time response. A Graphic User Interface (GUI) for the integration of the proposed HRC system is developed for the visualization of the real-time motion history and classification results of the gesture identification. A series of actual collaboration experiments are carried out between a human worker and a six-degree-of-freedom (6 DOF) Comau industrial robot, and the experimental results show the feasibility and robustness of the proposed system.


2013 ◽  
Vol 765-767 ◽  
pp. 2826-2829 ◽  
Author(s):  
Song Lin ◽  
Rui Min Hu ◽  
Yu Lian Xiao ◽  
Li Yu Gong

In this paper, we propose a novel real-time 3D hand gesture recognition algorithm based on depth information. We segment out the hand region from depth image and convert it to a point cloud. Then, 3D moment invariant features are computed at the point cloud. Finally, support vector machine (SVM) is employed to classify the shape of hand into different categories. We collect a benchmark dataset using Microsoft Kinect for Xbox and test the propose algorithm on it. Experimental results prove the robustness of our proposed algorithm.


Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 836 ◽  
Author(s):  
Young-Hoon Jin ◽  
In-Tae Hwang ◽  
Won-Hyung Lee

Augmented reality (AR) is a useful visualization technology that displays information by adding virtual images to the real world. In AR systems that require three-dimensional information, point cloud data is easy to use after real-time acquisition, however, it is difficult to measure and visualize real-time objects due to the large amount of data and a matching process. In this paper we explored a method of estimating pipes from point cloud data and visualizing them in real-time through augmented reality devices. In general, pipe estimation in a point cloud uses a Hough transform and is performed through a preprocessing process, such as noise filtering, normal estimation, or segmentation. However, there is a disadvantage in that the execution time is slow due to a large amount of computation. Therefore, for the real-time visualization in augmented reality devices, the fast cylinder matching method using random sample consensus (RANSAC) is required. In this paper, we proposed parallel processing, multiple frames, adjustable scale, and error correction for real-time visualization. The real-time visualization method through the augmented reality device obtained a depth image from the sensor and configured a uniform point cloud using a voxel grid algorithm. The constructed data was analyzed according to the fast cylinder matching method using RANSAC. The real-time visualization method through augmented reality devices is expected to be used to identify problems, such as the sagging of pipes, through real-time measurements at plant sites due to the spread of various AR devices.


Author(s):  
Sara Greenberg ◽  
John McPhee ◽  
Alexander Wong

Fitting a kinematic model of the human body to an image withoutthe use of markers is a method of pose estimation that is usefulfor tracking and posture evaluation. This model-fitting is challengingdue to the variation in human physique and the large numberof possible poses. One type of modeling is to represent the humanbody as a set of rigid body volumes. These volumes can beregistered to a target point cloud acquired from a depth camerausing the Iterative Closest Point (ICP) algorithm. The speed of ICPregistration is inversely proportional to the number of points in themodel and the target point clouds, and using the entire target pointcloud in this registration is too slow for real-time applications. Thiswork proposes the use of data-driven Monte Carlo methods to selecta subset of points from the target point cloud that maintains orimproves the accuracy of the point cloud registration for joint localizationin real time. For this application, we investigate curvature ofthe depth image as the driving variable to guide the sampling, andcompare it with benchmark random sampling techniques.


Sensors ◽  
2019 ◽  
Vol 19 (16) ◽  
pp. 3460 ◽  
Author(s):  
Yuanxing Dai ◽  
Yanming Fu ◽  
Baichun Li ◽  
Xuewei Zhang ◽  
Tianbiao Yu ◽  
...  

Using consumer depth cameras at close range yields a higher surface resolution of the object, but this makes more serious noises. This form of noise tends to be located at or on the edge of the realistic surface over a large area, which is an obstacle for real-time applications that do not rely on point cloud post-processing. In order to fill this gap, by analyzing the noise region based on position and shape, we proposed a composite filtering system for using consumer depth cameras at close range. The system consists of three main modules that are used to eliminate different types of noise areas. Taking the human hand depth image as an example, the proposed filtering system can eliminate most of the noise areas. All algorithms in the system are not based on window smoothing and are accelerated by the GPU. By using Kinect v2 and SR300, a large number of contrast experiments show that the system can get good results and has extremely high real-time performance, which can be used as a pre-step for real-time human-computer interaction, real-time 3D reconstruction, and further filtering.


Author(s):  
Eunchong Ha Et.al

Recently, media content that interacts in real time is increasing. In this paper, we introduce a real-time color extraction content system that utilizes the Kinect camera used in ‘COLOR’ media art. The Kinect camera used in the work detects and tracks the joints of the visitors that enter the exhibition space. Kinect detected data is mapped to color calibration in a Unity environment to generate a point cloud video. Get the pixel color of the spine shoulder joint coordinates of the visitor in the point cloud image. The color data is output on the screen in the form of color one, and passes through along with the spectators. Color circle decreases as the distance between the visitors and Kinect increases and the distance increases. When visitors come in and the color circles overlap, color of the mixed part will have an intermediate value between the two color circles. This work shows the form of a person's social movement through the colors that each person has and the mixture of the colors. The technology used in this work differs from other media arts in that it extracted the calibrated image colors separately and advanced the interactive media arts. We will improve the accuracy of the point cloud that corrects the color image and the depth image, and improve the color extraction accuracy of the visitors.


2021 ◽  
Vol 40 (3) ◽  
pp. 1-12
Author(s):  
Hao Zhang ◽  
Yuxiao Zhou ◽  
Yifei Tian ◽  
Jun-Hai Yong ◽  
Feng Xu

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras.


2021 ◽  
Vol 1910 (1) ◽  
pp. 012002
Author(s):  
Chao He ◽  
Jiayuan Gong ◽  
Yahui Yang ◽  
Dong Bi ◽  
Jianpin Lan ◽  
...  

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Seyed Muhammad Hossein Mousavi ◽  
S. Younes Mirinezhad

AbstractThis study presents a new color-depth based face database gathered from different genders and age ranges from Iranian subjects. Using suitable databases, it is possible to validate and assess available methods in different research fields. This database has application in different fields such as face recognition, age estimation and Facial Expression Recognition and Facial Micro Expressions Recognition. Image databases based on their size and resolution are mostly large. Color images usually consist of three channels namely Red, Green and Blue. But in the last decade, another aspect of image type has emerged, named “depth image”. Depth images are used in calculating range and distance between objects and the sensor. Depending on the depth sensor technology, it is possible to acquire range data differently. Kinect sensor version 2 is capable of acquiring color and depth data simultaneously. Facial expression recognition is an important field in image processing, which has multiple uses from animation to psychology. Currently, there is a few numbers of color-depth (RGB-D) facial micro expressions recognition databases existing. With adding depth data to color data, the accuracy of final recognition will be increased. Due to the shortage of color-depth based facial expression databases and some weakness in available ones, a new and almost perfect RGB-D face database is presented in this paper, covering Middle-Eastern face type. In the validation section, the database will be compared with some famous benchmark face databases. For evaluation, Histogram Oriented Gradients features are extracted, and classification algorithms such as Support Vector Machine, Multi-Layer Neural Network and a deep learning method, called Convolutional Neural Network or are employed. The results are so promising.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 546
Author(s):  
Zhenni Li ◽  
Haoyi Sun ◽  
Yuliang Gao ◽  
Jiao Wang

Depth maps obtained through sensors are often unsatisfactory because of their low-resolution and noise interference. In this paper, we propose a real-time depth map enhancement system based on a residual network which uses dual channels to process depth maps and intensity maps respectively and cancels the preprocessing process, and the algorithm proposed can achieve real-time processing speed at more than 30 fps. Furthermore, the FPGA design and implementation for depth sensing is also introduced. In this FPGA design, intensity image and depth image are captured by the dual-camera synchronous acquisition system as the input of neural network. Experiments on various depth map restoration shows our algorithms has better performance than existing LRMC, DE-CNN and DDTF algorithms on standard datasets and has a better depth map super-resolution, and our FPGA completed the test of the system to ensure that the data throughput of the USB 3.0 interface of the acquisition system is stable at 226 Mbps, and support dual-camera to work at full speed, that is, 54 fps@ (1280 × 960 + 328 × 248 × 3).


Sign in / Sign up

Export Citation Format

Share Document