Boundary-enhanced attention-aware network for detecting salient objects in RGB-depth images

2021 ◽  
Vol 30 (06) ◽  
Author(s):  
Junwei Wu ◽  
Wujie Zhou
Keyword(s):  
Author(s):  
Sukhendra Singh ◽  
G. N. Rathna ◽  
Vivek Singhal

Introduction: Sign language is the only way to communicate for speech-impaired people. But this sign language is not known to normal people so this is the cause of barrier in communicating. This is the problem faced by speech impaired people. In this paper, we have presented our solution which captured hand gestures with Kinect camera and classified the hand gesture into its correct symbol. Method: We used Kinect camera not the ordinary web camera because the ordinary camera does not capture its 3d orientation or depth of an image from camera however Kinect camera can capture 3d image and this will make classification more accurate. Result: Kinect camera will produce a different image for hand gestures for ‘2’ and ‘V’ and similarly for ‘1’ and ‘I’ however, normal web camera will not be able to distinguish between these two. We used hand gesture for Indian sign language and our dataset had 46339, RGB images and 46339 depth images. 80% of the total images were used for training and the remaining 20% for testing. In total 36 hand gestures were considered to capture alphabets and alphabets from A-Z and 10 for numeric, 26 for digits from 0-9 were considered to capture alphabets and Keywords. Conclusion: Along with real-time implementation, we have also shown the comparison of the performance of the various machine learning models in which we have found out the accuracy of CNN on depth- images has given the most accurate performance than other models. All these resulted were obtained on PYNQ Z2 board.


2021 ◽  
Vol 13 (5) ◽  
pp. 935
Author(s):  
Matthew Varnam ◽  
Mike Burton ◽  
Ben Esse ◽  
Giuseppe Salerno ◽  
Ryunosuke Kazahaya ◽  
...  

SO2 cameras are able to measure rapid changes in volcanic emission rate but require accurate calibrations and corrections to convert optical depth images into slant column densities. We conducted a test at Masaya volcano of two SO2 camera calibration approaches, calibration cells and co-located spectrometer, and corrected both calibrations for light dilution, a process caused by light scattering between the plume and camera. We demonstrate an advancement on the image-based correction that allows the retrieval of the scattering efficiency across a 2D area of an SO2 camera image. When appropriately corrected for the dilution, we show that our two calibration approaches produce final calculated emission rates that agree with simultaneously measured traverse flux data and each other but highlight that the observed distribution of gas within the image is different. We demonstrate that traverses and SO2 camera techniques, when used together, generate better plume speed estimates for traverses and improved knowledge of wind direction for the camera, producing more reliable emission rates. We suggest combining traverses and the SO2 camera should be adopted where possible.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1299
Author(s):  
Honglin Yuan ◽  
Tim Hoogenkamp ◽  
Remco C. Veltkamp

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.


2021 ◽  
Vol 183 ◽  
pp. 106082
Author(s):  
Yuzhen Wei ◽  
Yong He ◽  
Xiaoli Li

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1356
Author(s):  
Linda Christin Büker ◽  
Finnja Zuber ◽  
Andreas Hein ◽  
Sebastian Fudickar

With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis).


Author(s):  
Yi Liu ◽  
Ming Cong ◽  
Hang Dong ◽  
Dong Liu

Purpose The purpose of this paper is to propose a new method based on three-dimensional (3D) vision technologies and human skill integrated deep learning to solve assembly positioning task such as peg-in-hole. Design/methodology/approach Hybrid camera configuration was used to provide the global and local views. Eye-in-hand mode guided the peg to be in contact with the hole plate using 3D vision in global view. When the peg was in contact with the workpiece surface, eye-to-hand mode provided the local view to accomplish peg-hole positioning based on trained CNN. Findings The results of assembly positioning experiments proved that the proposed method successfully distinguished the target hole from the other same size holes according to the CNN. The robot planned the motion according to the depth images and human skill guide line. The final positioning precision was good enough for the robot to carry out force controlled assembly. Practical implications The developed framework can have an important impact on robotic assembly positioning process, which combine with the existing force-guidance assembly technology as to build a whole set of autonomous assembly technology. Originality/value This paper proposed a new approach to the robotic assembly positioning based on 3D visual technologies and human skill integrated deep learning. Dual cameras swapping mode was used to provide visual feedback for the entire assembly motion planning process. The proposed workpiece positioning method provided an effective disturbance rejection, autonomous motion planning and increased overall performance with depth images feedback. The proposed peg-hole positioning method with human skill integrated provided the capability of target perceptual aliasing avoiding and successive motion decision for the robotic assembly manipulation.


2012 ◽  
Author(s):  
Daniel B. Kubacki ◽  
Huy Q. Bui ◽  
S. Derin Babacan ◽  
Minh N. Do

Sign in / Sign up

Export Citation Format

Share Document