A Deep Learning Method for 3D Object Classification and Retrieval Using the Global Point Signature Plus and Deep Wide Residual Network

A vital and challenging task in computer vision is 3D Object Classification and Retrieval, with many practical applications such as an intelligent robot, autonomous driving, multimedia contents processing and retrieval, and augmented/mixed reality. Various deep learning methods were introduced for solving classification and retrieval problems of 3D objects. Almost all view-based methods use many views to handle spatial loss, although they perform the best among current techniques such as View-based, Voxelization, and Point Cloud methods. Many views make network structure more complicated due to the parallel Convolutional Neural Network (CNN). We propose a novel method that combines a Global Point Signature Plus with a Deep Wide Residual Network, namely GPSP-DWRN, in this paper. Global Point Signature Plus (GPSPlus) is a novel descriptor because it can capture more shape information of the 3D object for a single view. First, an original 3D model was converted into a colored one by applying GPSPlus. Then, a 32 × 32 × 3 matrix stored the obtained 2D projection of this color 3D model. This matrix was the input data of a Deep Residual Network, which used a single CNN structure. We evaluated the GPSP-DWRN for a retrieval task using the Shapnetcore55 dataset, while using two well-known datasets—ModelNet10 and ModelNet40 for a classification task. Based on our experimental results, our framework performed better than the state-of-the-art methods.

Download Full-text

A Deep Learning Method for 3D Object Classification Using the Wave Kernel Signature and A Center Point of the 3D-Triangle Mesh

Electronics ◽

10.3390/electronics8101196 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1196

Author(s):

Long Hoang ◽

Suk-Hwan Lee ◽

Oh-Heum Kwon ◽

Ki-Ryong Kwon

Keyword(s):

Computer Vision ◽

Large Scale ◽

3D Model ◽

Object Classification ◽

Human Vision ◽

3D Object ◽

Center Point ◽

Smart Cars ◽

Wave Kernel ◽

3D Object Classification

Computer vision recently has many applications such as smart cars, robot navigation, and computer-aided manufacturing. Object classification, in particular 3D classification, is a major part of computer vision. In this paper, we propose a novel method, wave kernel signature (WKS) and a center point (CP) method, which extracts color and distance features from a 3D model to tackle 3D object classification. The motivation of this idea is from the nature of human vision, which we tend to classify an object based on its color and size. Firstly, we find a center point of the mesh to define distance feature. Secondly, we calculate eigenvalues from the 3D mesh, and WKS values, respectively, to capture color feature. These features will be an input of a 2D convolution neural network (CNN) architecture. We use two large-scale 3D model datasets: ModelNet10 and ModelNet40 to evaluate the proposed method. Our experimental results show more accuracy and efficiency than other methods. The proposed method could apply for actual-world problems like autonomous driving and augmented/virtual reality.

Download Full-text

VB-Net: Voxel-Based Broad Learning Network for 3D Object Classification

Applied Sciences ◽

10.3390/app10196735 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6735 ◽

Cited By ~ 1

Author(s):

Zishu Liu ◽

Wei Song ◽

Yifei Tian ◽

Sumi Ji ◽

Yunsick Sung ◽

...

Keyword(s):

Deep Learning ◽

Three Dimensional ◽

Object Classification ◽

Point Clouds ◽

Learning System ◽

Learning Networks ◽

3D Object ◽

Feature Extractor ◽

Irregular Point ◽

3D Object Classification

Point clouds have been widely used in three-dimensional (3D) object classification tasks, i.e., people recognition in unmanned ground vehicles. However, the irregular data format of point clouds and the large number of parameters in deep learning networks affect the performance of object classification. This paper develops a 3D object classification system using a broad learning system (BLS) with a feature extractor called VB-Net. First, raw point clouds are voxelized into voxels. Through this step, irregular point clouds are converted into regular voxels which are easily processed by the feature extractor. Then, a pre-trained VoxNet is employed as a feature extractor to extract features from voxels. Finally, those features are used for object classification by the applied BLS. The proposed system is tested on the ModelNet40 dataset and ModelNet10 dataset. The average recognition accuracy was 83.99% and 90.08%, respectively. Compared to deep learning networks, the time consumption of the proposed system is significantly decreased.

Download Full-text

A Survey on Deep Learning Based Methods and Datasets for Monocular 3D Object Detection

Electronics ◽

10.3390/electronics10040517 ◽

2021 ◽

Vol 10 (4) ◽

pp. 517

Author(s):

Seong-heum Kim ◽

Youngbae Hwang

Keyword(s):

Deep Learning ◽

Object Detection ◽

Low Cost ◽

Detection Methods ◽

Future Research ◽

3D Object ◽

Practical Applications ◽

Depth Sensors ◽

Significant Research ◽

3D Object Detection

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.

Download Full-text

3D object classification in baggage computed tomography imagery using randomised clustering forests

2014 IEEE International Conference on Image Processing (ICIP) ◽

10.1109/icip.2014.7026053 ◽

2014 ◽

Cited By ~ 5

Author(s):

Andre Mouton ◽

Toby P. Breckon ◽

Greg T. Flitton ◽

Najla Megherbi

Keyword(s):

Computed Tomography ◽

Object Classification ◽

3D Object ◽

3D Object Classification

Download Full-text

Compressed VFH descriptor for 3D object classification

2014 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON) ◽

10.1109/3dtv.2014.6874757 ◽

2014 ◽

Cited By ~ 2

Author(s):

Yasir Salih ◽

A.S. Malik ◽

D. Sidibe ◽

M.T. Simsim ◽

N. Saad ◽

...

Keyword(s):

Object Classification ◽

3D Object ◽

3D Object Classification

Download Full-text

Deep Learning-Based Target Tracking and Classification for Low Quality Videos Using Coded Aperture Cameras

Sensors ◽

10.3390/s19173702 ◽

2019 ◽

Vol 19 (17) ◽

pp. 3702 ◽

Cited By ~ 7

Author(s):

Chiman Kwan ◽

Bryan Chou ◽

Jonathan Yang ◽

Akshay Rangamani ◽

Trac Tran ◽

...

Keyword(s):

Deep Learning ◽

Power Consumption ◽

Compressive Sensing ◽

Target Tracking ◽

Low Power Consumption ◽

Coded Aperture ◽

Learning Approach ◽

Residual Network ◽

Practical Applications ◽

Measurement Domain

Compressive sensing has seen many applications in recent years. One type of compressive sensing device is the Pixel-wise Code Exposure (PCE) camera, which has low power consumption and individual control of pixel exposure time. In order to use PCE cameras for practical applications, a time consuming and lossy process is needed to reconstruct the original frames. In this paper, we present a deep learning approach that directly performs target tracking and classification in the compressive measurement domain without any frame reconstruction. In particular, we propose to apply You Only Look Once (YOLO) to detect and track targets in the frames and we propose to apply Residual Network (ResNet) for classification. Extensive simulations using low quality optical and mid-wave infrared (MWIR) videos in the SENSIAC database demonstrated the efficacy of our proposed approach.

Download Full-text