scholarly journals Deep Learning Based Object Recognition Using Physically-Realistic Synthetic Depth Scenes

2019 ◽  
Vol 1 (3) ◽  
pp. 883-903 ◽  
Author(s):  
Daulet Baimukashev ◽  
Alikhan Zhilisbayev ◽  
Askat Kuzdeuov ◽  
Artemiy Oleinikov ◽  
Denis Fadeyev ◽  
...  

Recognizing objects and estimating their poses have a wide range of application in robotics. For instance, to grasp objects, robots need the position and orientation of objects in 3D. The task becomes challenging in a cluttered environment with different types of objects. A popular approach to tackle this problem is to utilize a deep neural network for object recognition. However, deep learning-based object detection in cluttered environments requires a substantial amount of data. Collection of these data requires time and extensive human labor for manual labeling. In this study, our objective was the development and validation of a deep object recognition framework using a synthetic depth image dataset. We synthetically generated a depth image dataset of 22 objects randomly placed in a 0.5 m × 0.5 m × 0.1 m box, and automatically labeled all objects with an occlusion rate below 70%. Faster Region Convolutional Neural Network (R-CNN) architecture was adopted for training using a dataset of 800,000 synthetic depth images, and its performance was tested on a real-world depth image dataset consisting of 2000 samples. Deep object recognizer has 40.96% detection accuracy on the real depth images and 93.5% on the synthetic depth images. Training the deep learning model with noise-added synthetic images improves the recognition accuracy for real images to 46.3%. The object detection framework can be trained on synthetically generated depth data, and then employed for object recognition on the real depth data in a cluttered environment. Synthetic depth data-based deep object detection has the potential to substantially decrease the time and human effort required for the extensive data collection and labeling.

2021 ◽  
Author(s):  
Abhinav Sundar

The objective of this thesis was to evaluate the viability of implementation of an object recognition algorithm driven by deep learning for aerospace manufacturing, maintenance and assembly tasks. Comparison research has found that current computer vision methods such as, spatial mapping was limited to macro-object recognition because of its nodal wireframe analysis. An optical object recognition algorithm was trained to learn complex geometric and chromatic characteristics, therefore allowing for micro-object recognition, such as cables and other critical components. This thesis investigated the use of a convolutional neural network with object recognition algorithms. The viability of two categories of object recognition algorithms were analyzed: image prediction and object detection. Due to a viral epidemic, this thesis was limited in analytical consistency as resources were not readily available. The prediction-class algorithm was analyzed using a custom dataset comprised of 15 552 images of the MaxFlight V2002 Full Motion Simulator’s inverter system, and a model was created by transfer-learning that dataset onto the InceptionV3 convolutional neural network (CNN). The detection-class algorithm was analyzed using a custom dataset comprised of 100 images of two SUVs of different brand and style, and a model was created by transfer-learning that dataset onto the YOLOv3 deep learning architecture. The tests showed that the object recognition algorithms successfully identified the components with good accuracy, 99.97% mAP for prediction-class and 89.54% mAP. For detection-class. The accuracies and data collected with literature review found that object detection algorithms are accuracy, created for live -feed analysis and were suitable for the significant applications of AVI and aircraft assembly. In the future, a larger dataset needs to be complied to increase reliability and a custom convolutional neural network and deep learning algorithm needs to be developed specifically for aerospace assembly, maintenance and manufacturing applications.


2021 ◽  
Author(s):  
Armin Masoumian ◽  
David G.F. Marei ◽  
Saddam Abdulwahab ◽  
Julián Cristiano ◽  
Domenec Puig ◽  
...  

Determining the distance between the objects in a scene and the camera sensor from 2D images is feasible by estimating depth images using stereo cameras or 3D cameras. The outcome of depth estimation is relative distances that can be used to calculate absolute distances to be applicable in reality. However, distance estimation is very challenging using 2D monocular cameras. This paper presents a deep learning framework that consists of two deep networks for depth estimation and object detection using a single image. Firstly, objects in the scene are detected and localized using the You Only Look Once (YOLOv5) network. In parallel, the estimated depth image is computed using a deep autoencoder network to detect the relative distances. The proposed object detection based YOLO was trained using a supervised learning technique, in turn, the network of depth estimation was self-supervised training. The presented distance estimation framework was evaluated on real images of outdoor scenes. The achieved results show that the proposed framework is promising and it yields an accuracy of 96% with RMSE of 0.203 of the correct absolute distance.


2021 ◽  
Author(s):  
Abhinav Sundar

The objective of this thesis was to evaluate the viability of implementation of an object recognition algorithm driven by deep learning for aerospace manufacturing, maintenance and assembly tasks. Comparison research has found that current computer vision methods such as, spatial mapping was limited to macro-object recognition because of its nodal wireframe analysis. An optical object recognition algorithm was trained to learn complex geometric and chromatic characteristics, therefore allowing for micro-object recognition, such as cables and other critical components. This thesis investigated the use of a convolutional neural network with object recognition algorithms. The viability of two categories of object recognition algorithms were analyzed: image prediction and object detection. Due to a viral epidemic, this thesis was limited in analytical consistency as resources were not readily available. The prediction-class algorithm was analyzed using a custom dataset comprised of 15 552 images of the MaxFlight V2002 Full Motion Simulator’s inverter system, and a model was created by transfer-learning that dataset onto the InceptionV3 convolutional neural network (CNN). The detection-class algorithm was analyzed using a custom dataset comprised of 100 images of two SUVs of different brand and style, and a model was created by transfer-learning that dataset onto the YOLOv3 deep learning architecture. The tests showed that the object recognition algorithms successfully identified the components with good accuracy, 99.97% mAP for prediction-class and 89.54% mAP. For detection-class. The accuracies and data collected with literature review found that object detection algorithms are accuracy, created for live -feed analysis and were suitable for the significant applications of AVI and aircraft assembly. In the future, a larger dataset needs to be complied to increase reliability and a custom convolutional neural network and deep learning algorithm needs to be developed specifically for aerospace assembly, maintenance and manufacturing applications.


2021 ◽  
Vol 11 (11) ◽  
pp. 4758
Author(s):  
Ana Malta ◽  
Mateus Mendes ◽  
Torres Farinha

Maintenance professionals and other technical staff regularly need to learn to identify new parts in car engines and other equipment. The present work proposes a model of a task assistant based on a deep learning neural network. A YOLOv5 network is used for recognizing some of the constituent parts of an automobile. A dataset of car engine images was created and eight car parts were marked in the images. Then, the neural network was trained to detect each part. The results show that YOLOv5s is able to successfully detect the parts in real time video streams, with high accuracy, thus being useful as an aid to train professionals learning to deal with new equipment using augmented reality. The architecture of an object recognition system using augmented reality glasses is also designed.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Seyed Muhammad Hossein Mousavi ◽  
S. Younes Mirinezhad

AbstractThis study presents a new color-depth based face database gathered from different genders and age ranges from Iranian subjects. Using suitable databases, it is possible to validate and assess available methods in different research fields. This database has application in different fields such as face recognition, age estimation and Facial Expression Recognition and Facial Micro Expressions Recognition. Image databases based on their size and resolution are mostly large. Color images usually consist of three channels namely Red, Green and Blue. But in the last decade, another aspect of image type has emerged, named “depth image”. Depth images are used in calculating range and distance between objects and the sensor. Depending on the depth sensor technology, it is possible to acquire range data differently. Kinect sensor version 2 is capable of acquiring color and depth data simultaneously. Facial expression recognition is an important field in image processing, which has multiple uses from animation to psychology. Currently, there is a few numbers of color-depth (RGB-D) facial micro expressions recognition databases existing. With adding depth data to color data, the accuracy of final recognition will be increased. Due to the shortage of color-depth based facial expression databases and some weakness in available ones, a new and almost perfect RGB-D face database is presented in this paper, covering Middle-Eastern face type. In the validation section, the database will be compared with some famous benchmark face databases. For evaluation, Histogram Oriented Gradients features are extracted, and classification algorithms such as Support Vector Machine, Multi-Layer Neural Network and a deep learning method, called Convolutional Neural Network or are employed. The results are so promising.


Nutrients ◽  
2018 ◽  
Vol 10 (12) ◽  
pp. 2005 ◽  
Author(s):  
Frank Lo ◽  
Yingnan Sun ◽  
Jianing Qiu ◽  
Benny Lo

An objective dietary assessment system can help users to understand their dietary behavior and enable targeted interventions to address underlying health problems. To accurately quantify dietary intake, measurement of the portion size or food volume is required. For volume estimation, previous research studies mostly focused on using model-based or stereo-based approaches which rely on manual intervention or require users to capture multiple frames from different viewing angles which can be tedious. In this paper, a view synthesis approach based on deep learning is proposed to reconstruct 3D point clouds of food items and estimate the volume from a single depth image. A distinct neural network is designed to use a depth image from one viewing angle to predict another depth image captured from the corresponding opposite viewing angle. The whole 3D point cloud map is then reconstructed by fusing the initial data points with the synthesized points of the object items through the proposed point cloud completion and Iterative Closest Point (ICP) algorithms. Furthermore, a database with depth images of food object items captured from different viewing angles is constructed with image rendering and used to validate the proposed neural network. The methodology is then evaluated by comparing the volume estimated by the synthesized 3D point cloud with the ground truth volume of the object items.


2020 ◽  
Vol 12 (22) ◽  
pp. 9785
Author(s):  
Kisu Lee ◽  
Goopyo Hong ◽  
Lee Sael ◽  
Sanghyo Lee ◽  
Ha Young Kim

Defects in residential building façades affect the structural integrity of buildings and degrade external appearances. Defects in a building façade are typically managed using manpower during maintenance. This approach is time-consuming, yields subjective results, and can lead to accidents or casualties. To address this, we propose a building façade monitoring system that utilizes an object detection method based on deep learning to efficiently manage defects by minimizing the involvement of manpower. The dataset used for training a deep-learning-based network contains actual residential building façade images. Various building designs in these raw images make it difficult to detect defects because of their various types and complex backgrounds. We employed the faster regions with convolutional neural network (Faster R-CNN) structure for more accurate defect detection in such environments, achieving an average precision (intersection over union (IoU) = 0.5) of 62.7% for all types of trained defects. As it is difficult to detect defects in a training environment, it is necessary to improve the performance of the network. However, the object detection network employed in this study yields an excellent performance in complex real-world images, indicating the possibility of developing a system that would detect defects in more types of building façades.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Ahmed Jawad A. AlBdairi ◽  
Zhu Xiao ◽  
Mohammed Alghaili

The interest in face recognition studies has grown rapidly in the last decade. One of the most important problems in face recognition is the identification of ethnics of people. In this study, a new deep learning convolutional neural network is designed to create a new model that can recognize the ethnics of people through their facial features. The new dataset for ethnics of people consists of 3141 images collected from three different nationalities. To the best of our knowledge, this is the first image dataset collected for the ethnics of people and that dataset will be available for the research community. The new model was compared with two state-of-the-art models, VGG and Inception V3, and the validation accuracy was calculated for each convolutional neural network. The generated models have been tested through several images of people, and the results show that the best performance was achieved by our model with a verification accuracy of 96.9%.


Sensors ◽  
2019 ◽  
Vol 19 (3) ◽  
pp. 529 ◽  
Author(s):  
Hui Zeng ◽  
Bin Yang ◽  
Xiuqing Wang ◽  
Jiwei Liu ◽  
Dongmei Fu

With the development of low-cost RGB-D (Red Green Blue-Depth) sensors, RGB-D object recognition has attracted more and more researchers’ attention in recent years. The deep learning technique has become popular in the field of image analysis and has achieved competitive results. To make full use of the effective identification information in the RGB and depth images, we propose a multi-modal deep neural network and a DS (Dempster Shafer) evidence theory based RGB-D object recognition method. First, the RGB and depth images are preprocessed and two convolutional neural networks are trained, respectively. Next, we perform multi-modal feature learning using the proposed quadruplet samples based objective function to fine-tune the network parameters. Then, two probability classification results are obtained using two sigmoid SVMs (Support Vector Machines) with the learned RGB and depth features. Finally, the DS evidence theory based decision fusion method is used for integrating the two classification results. Compared with other RGB-D object recognition methods, our proposed method adopts two fusion strategies: Multi-modal feature learning and DS decision fusion. Both the discriminative information of each modality and the correlation information between the two modalities are exploited. Extensive experimental results have validated the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document