Angular Triplet-Center Loss for Multi-View 3D Shape Retrieval

How to obtain the desirable representation of a 3D shape, which is discriminative across categories and polymerized within classes, is a significant challenge in 3D shape retrieval. Most existing 3D shape retrieval methods focus on capturing strong discriminative shape representation with softmax loss for the classification task, while the shape feature learning with metric loss is neglected for 3D shape retrieval. In this paper, we address this problem based on the intuition that the cosine distance of shape embeddings should be close enough within the same class and far away across categories. Since most of 3D shape retrieval tasks use cosine distance of shape features for measuring shape similarity, we propose a novel metric loss named angular triplet-center loss, which directly optimizes the cosine distances between the features. It inherits the triplet-center loss property to achieve larger inter-class distance and smaller intra-class distance simultaneously. Unlike previous metric loss utilized in 3D shape retrieval methods, where Euclidean distance is adopted and the margin design is difficult, the proposed method is more convenient to train feature embeddings and more suitable for 3D shape retrieval. Moreover, the angle margin is adopted to replace the cosine margin in order to provide more explicit discriminative constraints on an embedding space. Extensive experimental results on two popular 3D object retrieval benchmarks, ModelNet40 and ShapeNetCore 55, demonstrate the effectiveness of our proposed loss, and our method has achieved state-ofthe-art results on various 3D shape datasets.

Download Full-text

MLVCNN: Multi-Loop-View Convolutional Neural Network for 3D Shape Retrieval

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018513 ◽

2019 ◽

Vol 33 ◽

pp. 8513-8520 ◽

Cited By ~ 10

Author(s):

Jianwen Jiang ◽

Di Bao ◽

Ziqiang Chen ◽

Xibin Zhao ◽

Yue Gao

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Shape Representation ◽

The State ◽

Shape Retrieval ◽

3D Shape Retrieval ◽

3D Shape ◽

Loop Level ◽

Art Methods

3D shape retrieval has attracted much attention and become a hot topic in computer vision field recently.With the development of deep learning, 3D shape retrieval has also made great progress and many view-based methods have been introduced in recent years. However, how to represent 3D shapes better is still a challenging problem. At the same time, the intrinsic hierarchical associations among views still have not been well utilized. In order to tackle these problems, in this paper, we propose a multi-loop-view convolutional neural network (MLVCNN) framework for 3D shape retrieval. In this method, multiple groups of views are extracted from different loop directions first. Given these multiple loop views, the proposed MLVCNN framework introduces a hierarchical view-loop-shape architecture, i.e., the view level, the loop level, and the shape level, to conduct 3D shape representation from different scales. In the view-level, a convolutional neural network is first trained to extract view features. Then, the proposed Loop Normalization and LSTM are utilized for each loop of view to generate the loop-level features, which considering the intrinsic associations of the different views in the same loop. Finally, all the loop-level descriptors are combined into a shape-level descriptor for 3D shape representation, which is used for 3D shape retrieval. Our proposed method has been evaluated on the public 3D shape benchmark, i.e., ModelNet40. Experiments and comparisons with the state-of-the-art methods show that the proposed MLVCNN method can achieve significant performance improvement on 3D shape retrieval tasks. Our MLVCNN outperforms the state-of-the-art methods by the mAP of 4.84% in 3D shape retrieval task. We have also evaluated the performance of the proposed method on the 3D shape classification task where MLVCNN also achieves superior performance compared with recent methods.

Download Full-text

Euclidean-distance-based canonical forms for non-rigid 3D shape retrieval

Pattern Recognition ◽

10.1016/j.patcog.2015.02.021 ◽

2015 ◽

Vol 48 (8) ◽

pp. 2500-2512 ◽

Cited By ~ 17

Author(s):

David Pickup ◽

Xianfang Sun ◽

Paul L. Rosin ◽

Ralph R. Martin

Keyword(s):

Euclidean Distance ◽

Shape Retrieval ◽

Canonical Forms ◽

3D Shape Retrieval ◽

3D Shape

Download Full-text

Hierarchical Instance Feature Alignment for 2D Image-Based 3D Shape Retrieval

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/117 ◽

2020 ◽

Author(s):

Heyu Zhou ◽

Weizhi Nie ◽

Wenhui Li ◽

Dan Song ◽

An-An Liu

Keyword(s):

Feature Learning ◽

Industrial Applications ◽

Shape Retrieval ◽

3D Shape Retrieval ◽

Retrieval Task ◽

3D Shape ◽

Domain Variations ◽

Visual Characteristics ◽

High Level ◽

Feature Alignment

2D image-based 3D shape retrieval has become a hot research topic since its wide industrial applications and academic significance. However, existing view-based 3D shape retrieval methods are restricted by two settings, 1) learn the common-class features while neglecting the instance visual characteristics, 2) narrow the global domain variations while ignoring the local semantic variations in each category. To overcome these problems, we propose a novel hierarchical instance feature alignment (HIFA) method for this task. HIFA consists of two modules, cross-modal instance feature learning and hierarchical instance feature alignment. Specifically, we first use CNN to extract both 2D image and multi-view features. Then, we maximize the mutual information between the input data and the high-level feature to preserve as much as visual characteristics of an individual instance. To mix up the features in two domains, we enforce feature alignment considering both global domain and local semantic levels. By narrowing the global domain variations we impose the identical large norm restriction on both 2D and 3D feature-norm expectations to facilitate more transferable possibility. By narrowing the local variations we propose to minimize the distance between two centroids of the same class from different domains to obtain semantic consistency. Extensive experiments on two popular and novel datasets, MI3DOR and MI3DOR-2, validate the superiority of HIFA for 2D image-based 3D shape retrieval task.

Download Full-text