scholarly journals Automated Sustainable Multi-Object Segmentation and Recognition via Modified Sampling Consensus and Kernel Sliding Perceptron

Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1928
Author(s):  
Adnan Ahmed Rafique ◽  
Ahmad Jalal ◽  
Kibum Kim

Object recognition in depth images is challenging and persistent task in machine vision, robotics, and automation of sustainability. Object recognition tasks are a challenging part of various multimedia technologies for video surveillance, human–computer interaction, robotic navigation, drone targeting, tourist guidance, and medical diagnostics. However, the symmetry that exists in real-world objects plays a significant role in perception and recognition of objects in both humans and machines. With advances in depth sensor technology, numerous researchers have recently proposed RGB-D object recognition techniques. In this paper, we introduce a sustainable object recognition framework that is consistent despite any change in the environment, and can recognize and analyze RGB-D objects in complex indoor scenarios. Firstly, after acquiring a depth image, the point cloud and the depth maps are extracted to obtain the planes. Then, the plane fitting model and the proposed modified maximum likelihood estimation sampling consensus (MMLESAC) are applied as a segmentation process. Then, depth kernel descriptors (DKDES) over segmented objects are computed for single and multiple object scenarios separately. These DKDES are subsequently carried forward to isometric mapping (IsoMap) for feature space reduction. Finally, the reduced feature vector is forwarded to a kernel sliding perceptron (KSP) for the recognition of objects. Three datasets are used to evaluate four different experiments by employing a cross-validation scheme to validate the proposed model. The experimental results over RGB-D object, RGB-D scene, and NYUDv1 datasets demonstrate overall accuracies of 92.2%, 88.5%, and 90.5% respectively. These results outperform existing state-of-the-art methods and verify the suitability of the method.

Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2815
Author(s):  
Shih-Hung Yang ◽  
Yao-Mao Cheng ◽  
Jyun-We Huang ◽  
Yon-Ping Chen

Automatic fingerspelling recognition tackles the communication barrier between deaf and hearing individuals. However, the accuracy of fingerspelling recognition is reduced by high intra-class variability and low inter-class variability. In the existing methods, regular convolutional kernels, which have limited receptive fields (RFs) and often cannot detect subtle discriminative details, are applied to learn features. In this study, we propose a receptive field-aware network with finger attention (RFaNet) that highlights the finger regions and builds inter-finger relations. To highlight the discriminative details of these fingers, RFaNet reweights the low-level features of the hand depth image with those of the non-forearm image and improves finger localization, even when the wrist is occluded. RFaNet captures neighboring and inter-region dependencies between fingers in high-level features. An atrous convolution procedure enlarges the RFs at multiple scales and a non-local operation computes the interactions between multi-scale feature maps, thereby facilitating the building of inter-finger relations. Thus, the representation of a sign is invariant to viewpoint changes, which are primarily responsible for intra-class variability. On an American Sign Language fingerspelling dataset, RFaNet achieved 1.77% higher classification accuracy than state-of-the-art methods. RFaNet achieved effective transfer learning when the number of labeled depth images was insufficient. The fingerspelling representation of a depth image can be effectively transferred from large- to small-scale datasets via highlighting the finger regions and building inter-finger relations, thereby reducing the requirement for expensive fingerspelling annotations.


2019 ◽  
Vol 1 (3) ◽  
pp. 883-903 ◽  
Author(s):  
Daulet Baimukashev ◽  
Alikhan Zhilisbayev ◽  
Askat Kuzdeuov ◽  
Artemiy Oleinikov ◽  
Denis Fadeyev ◽  
...  

Recognizing objects and estimating their poses have a wide range of application in robotics. For instance, to grasp objects, robots need the position and orientation of objects in 3D. The task becomes challenging in a cluttered environment with different types of objects. A popular approach to tackle this problem is to utilize a deep neural network for object recognition. However, deep learning-based object detection in cluttered environments requires a substantial amount of data. Collection of these data requires time and extensive human labor for manual labeling. In this study, our objective was the development and validation of a deep object recognition framework using a synthetic depth image dataset. We synthetically generated a depth image dataset of 22 objects randomly placed in a 0.5 m × 0.5 m × 0.1 m box, and automatically labeled all objects with an occlusion rate below 70%. Faster Region Convolutional Neural Network (R-CNN) architecture was adopted for training using a dataset of 800,000 synthetic depth images, and its performance was tested on a real-world depth image dataset consisting of 2000 samples. Deep object recognizer has 40.96% detection accuracy on the real depth images and 93.5% on the synthetic depth images. Training the deep learning model with noise-added synthetic images improves the recognition accuracy for real images to 46.3%. The object detection framework can be trained on synthetically generated depth data, and then employed for object recognition on the real depth data in a cluttered environment. Synthetic depth data-based deep object detection has the potential to substantially decrease the time and human effort required for the extensive data collection and labeling.


Author(s):  
Yan Wu ◽  
Jiqian Li ◽  
Jing Bai

RGB-D-based object recognition has been enthusiastically investigated in the past few years. RGB and depth images provide useful and complementary information. Fusing RGB and depth features can significantly increase the accuracy of object recognition. However, previous works just simply take the depth image as the fourth channel of the RGB image and concatenate the RGB and depth features, ignoring the different power of RGB and depth information for different objects. In this paper, a new method which contains three different classifiers is proposed to fuse features extracted from RGB image and depth image for RGB-D-based object recognition. Firstly, a RGB classifier and a depth classifier are trained by cross-validation to get the accuracy difference between RGB and depth features for each object. Then a variant RGB-D classifier is trained with different initialization parameters for each class according to the accuracy difference. The variant RGB-D-classifier can result in a more robust classification performance. The proposed method is evaluated on two benchmark RGB-D datasets. Compared with previous methods, ours achieves comparable performance with the state-of-the-art method.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Seyed Muhammad Hossein Mousavi ◽  
S. Younes Mirinezhad

AbstractThis study presents a new color-depth based face database gathered from different genders and age ranges from Iranian subjects. Using suitable databases, it is possible to validate and assess available methods in different research fields. This database has application in different fields such as face recognition, age estimation and Facial Expression Recognition and Facial Micro Expressions Recognition. Image databases based on their size and resolution are mostly large. Color images usually consist of three channels namely Red, Green and Blue. But in the last decade, another aspect of image type has emerged, named “depth image”. Depth images are used in calculating range and distance between objects and the sensor. Depending on the depth sensor technology, it is possible to acquire range data differently. Kinect sensor version 2 is capable of acquiring color and depth data simultaneously. Facial expression recognition is an important field in image processing, which has multiple uses from animation to psychology. Currently, there is a few numbers of color-depth (RGB-D) facial micro expressions recognition databases existing. With adding depth data to color data, the accuracy of final recognition will be increased. Due to the shortage of color-depth based facial expression databases and some weakness in available ones, a new and almost perfect RGB-D face database is presented in this paper, covering Middle-Eastern face type. In the validation section, the database will be compared with some famous benchmark face databases. For evaluation, Histogram Oriented Gradients features are extracted, and classification algorithms such as Support Vector Machine, Multi-Layer Neural Network and a deep learning method, called Convolutional Neural Network or are employed. The results are so promising.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1299
Author(s):  
Honglin Yuan ◽  
Tim Hoogenkamp ◽  
Remco C. Veltkamp

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1356
Author(s):  
Linda Christin Büker ◽  
Finnja Zuber ◽  
Andreas Hein ◽  
Sebastian Fudickar

With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis).


2021 ◽  
Author(s):  
Thomas Stranick ◽  
Christian Lopez

Abstract This work introduces a Virtual Reality (VR) Exergame application to prevent Work Related Musculoskeletal Disorders (WMSDs). WMSDs are an important issue that can have a direct economic impact since they can injure workers, who are then forced to take time off. Exercise and stretching is one method that can benefit workers’ muscles and help prevent WMSDs. While several applications have been developed to prevent WMSDs, most of the existing applications suffer from a lack of immersivity or just focus on education and not necessarily helping workers warm-up or stretch. Hence, this work presents an Exergame application that leverages VR and Depth-sensor technology to help provide users with an immersive first-person experience. The objective of the VR Exergame is to encourage and motivate users to perform full-body movements in order to pass through a series of obstacles. The application implements a variety of game elements to help motivate users to play the game and stretch. While in the game, users can visualize their motions by controlling the virtual avatar with their body movements. It is expected that this immersivity will motivate and encourage the users. Initial findings show the positive effects that the base exergame has on individuals’ motivation and physical activity level. The results indicate that the application was able to engage individuals in low-intensity exercises that produced significant and consistent increases in their heart rate. Lastly, this work explores the development and benefits that this VR Exergame could bring by motivating workers and preventing WMSDs.


Sensors ◽  
2019 ◽  
Vol 19 (2) ◽  
pp. 393 ◽  
Author(s):  
Jonha Lee ◽  
Dong-Wook Kim ◽  
Chee Won ◽  
Seung-Won Jung

Segmentation of human bodies in images is useful for a variety of applications, including background substitution, human activity recognition, security, and video surveillance applications. However, human body segmentation has been a challenging problem, due to the complicated shape and motion of a non-rigid human body. Meanwhile, depth sensors with advanced pattern recognition algorithms provide human body skeletons in real time with reasonable accuracy. In this study, we propose an algorithm that projects the human body skeleton from a depth image to a color image, where the human body region is segmented in the color image by using the projected skeleton as a segmentation cue. Experimental results using the Kinect sensor demonstrate that the proposed method provides high quality segmentation results and outperforms the conventional methods.


Sign in / Sign up

Export Citation Format

Share Document