scholarly journals Siamese CNN-BiLSTM Architecture for 3D Shape Representation Learning

Author(s):  
Guoxian Dai ◽  
Jin Xie ◽  
Yi Fang

Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.

Author(s):  
Zhizhong Han ◽  
Mingyang Shang ◽  
Yu-Shen Liu ◽  
Matthias Zwicker

In this paper, we present a novel unsupervised representation learning approach for 3D shapes, which is an important research challenge as it avoids the manual effort required for collecting supervised data. Our method trains an RNNbased neural network architecture to solve multiple view inter-prediction tasks for each shape. Given several nearby views of a shape, we define view inter-prediction as the task of predicting the center view between the input views, and reconstructing the input views in a low-level feature space. The key idea of our approach is to implement the shape representation as a shape-specific global memory that is shared between all local view inter-predictions for each shape. Intuitively, this memory enables the system to aggregate information that is useful to better solve the view inter-prediction tasks for each shape, and to leverage the memory as a viewindependent shape representation. Our approach obtains the best results using a combination of L2 and adversarial losses for the view inter-prediction task. We show that VIP-GAN outperforms state-of-the-art methods in unsupervised 3D feature learning on three large-scale 3D shape benchmarks.


Author(s):  
Yutong Feng ◽  
Yifan Feng ◽  
Haoxuan You ◽  
Xibin Zhao ◽  
Yue Gao

Mesh is an important and powerful type of data for 3D shapes and widely studied in the field of computer vision and computer graphics. Regarding the task of 3D shape representation, there have been extensive research efforts concentrating on how to represent 3D shapes well using volumetric grid, multi-view and point cloud. However, there is little effort on using mesh data in recent years, due to the complexity and irregularity of mesh data. In this paper, we propose a mesh neural network, named MeshNet, to learn 3D shape representation from mesh data. In this method, face-unit and feature splitting are introduced, and a general architecture with available and effective blocks are proposed. In this way, MeshNet is able to solve the complexity and irregularity problem of mesh and conduct 3D shape representation well. We have applied the proposed MeshNet method in the applications of 3D shape classification and retrieval. Experimental results and comparisons with the state-of-the-art methods demonstrate that the proposed MeshNet can achieve satisfying 3D shape classification and retrieval performance, which indicates the effectiveness of the proposed method on 3D shape representation.


2021 ◽  
Vol 14 (4) ◽  
pp. 702-713
Author(s):  
N. Prabakaran ◽  
Rajasekaran Palaniappan ◽  
R. Kannadasan ◽  
Satya Vinay Dudi ◽  
V. Sasidhar

PurposeWe propose a Machine Learning (ML) approach that will be trained from the available financial data and is able to gain the trends over the data and then uses the acquired knowledge for a more accurate forecasting of financial series. This work will provide a more precise results when weighed up to aged financial series forecasting algorithms. The LSTM Classic will be used to forecast the momentum of the Financial Series Index and also applied to its commodities. The network will be trained and evaluated for accuracy with various sizes of data sets, i.e. weekly historical data of MCX, GOLD, COPPER and the results will be calculated.Design/methodology/approachDesirable LSTM model for script price forecasting from the perspective of minimizing MSE. The approach which we have followed is shown below. (1) Acquire the Dataset. (2) Define your training and testing columns in the dataset. (3) Transform the input value using scalar. (4) Define the custom loss function. (5) Build and Compile the model. (6) Visualise the improvements in results.FindingsFinancial series is one of the very aged techniques where a commerce person would commerce financial scripts, make business and earn some wealth from these companies that vend a part of their business on trading manifesto. Forecasting financial script prices is complex tasks that consider extensive human–computer interaction. Due to the correlated nature of financial series prices, conventional batch processing methods like an artificial neural network, convolutional neural network, cannot be utilised efficiently for financial market analysis. We propose an online learning algorithm that utilises an upgraded of recurrent neural networks called long short-term memory Classic (LSTM). The LSTM Classic is quite different from normal LSTM as it has customised loss function in it. This LSTM Classic avoids long-term dependence on its metrics issues because of its unique internal storage unit structure, and it helps forecast financial time series. Financial Series Index is the combination of various commodities (time series). This makes Financial Index more reliable than the financial time series as it does not show a drastic change in its value even some of its commodities are affected. This work will provide a more precise results when weighed up to aged financial series forecasting algorithms.Originality/valueWe had built the customised loss function model by using LSTM scheme and have experimented on MCX index and as well as on its commodities and improvements in results are calculated for every epoch that we run for the whole rows present in the dataset. For every epoch we can visualise the improvements in loss. One more improvement that can be done to our model that the relationship between price difference and directional loss is specific to other financial scripts. Deep evaluations can be done to identify the best combination of these for a particular stock to obtain better results.


Author(s):  
Jianwen Jiang ◽  
Di Bao ◽  
Ziqiang Chen ◽  
Xibin Zhao ◽  
Yue Gao

3D shape retrieval has attracted much attention and become a hot topic in computer vision field recently.With the development of deep learning, 3D shape retrieval has also made great progress and many view-based methods have been introduced in recent years. However, how to represent 3D shapes better is still a challenging problem. At the same time, the intrinsic hierarchical associations among views still have not been well utilized. In order to tackle these problems, in this paper, we propose a multi-loop-view convolutional neural network (MLVCNN) framework for 3D shape retrieval. In this method, multiple groups of views are extracted from different loop directions first. Given these multiple loop views, the proposed MLVCNN framework introduces a hierarchical view-loop-shape architecture, i.e., the view level, the loop level, and the shape level, to conduct 3D shape representation from different scales. In the view-level, a convolutional neural network is first trained to extract view features. Then, the proposed Loop Normalization and LSTM are utilized for each loop of view to generate the loop-level features, which considering the intrinsic associations of the different views in the same loop. Finally, all the loop-level descriptors are combined into a shape-level descriptor for 3D shape representation, which is used for 3D shape retrieval. Our proposed method has been evaluated on the public 3D shape benchmark, i.e., ModelNet40. Experiments and comparisons with the state-of-the-art methods show that the proposed MLVCNN method can achieve significant performance improvement on 3D shape retrieval tasks. Our MLVCNN outperforms the state-of-the-art methods by the mAP of 4.84% in 3D shape retrieval task. We have also evaluated the performance of the proposed method on the 3D shape classification task where MLVCNN also achieves superior performance compared with recent methods.


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 723 ◽  
Author(s):  
Divish Rengasamy ◽  
Mina Jafari ◽  
Benjamin Rothwell ◽  
Xin Chen ◽  
Grazziela P. Figueredo

Deep learning has been employed to prognostic and health management of automotive and aerospace with promising results. Literature in this area has revealed that most contributions regarding deep learning is largely focused on the model’s architecture. However, contributions regarding improvement of different aspects in deep learning, such as custom loss function for prognostic and health management are scarce. There is therefore an opportunity to improve upon the effectiveness of deep learning for the system’s prognostics and diagnostics without modifying the models’ architecture. To address this gap, the use of two different dynamically weighted loss functions, a newly proposed weighting mechanism and a focal loss function for prognostics and diagnostics task are investigated. A dynamically weighted loss function is expected to modify the learning process by augmenting the loss function with a weight value corresponding to the learning error of each data instance. The objective is to force deep learning models to focus on those instances where larger learning errors occur in order to improve their performance. The two loss functions used are evaluated using four popular deep learning architectures, namely, deep feedforward neural network, one-dimensional convolutional neural network, bidirectional gated recurrent unit and bidirectional long short-term memory on the commercial modular aero-propulsion system simulation data from NASA and air pressure system failure data for Scania trucks. Experimental results show that dynamically-weighted loss functions helps us achieve significant improvement for remaining useful life prediction and fault detection rate over non-weighted loss function predictions.


Sensors ◽  
2020 ◽  
Vol 20 (12) ◽  
pp. 3539 ◽  
Author(s):  
Chang-Cheng Lo ◽  
Ching-Hung Lee ◽  
Wen-Cheng Huang

This study aimed to propose a prognostic method based on a one-dimensional convolutional neural network (1-D CNN) with clustering loss by classification training. The 1-D CNN was trained by collecting the vibration signals of normal and malfunction data in hybrid loss function (i.e., classification loss in output and clustering loss in feature space). Subsequently, the obtained feature was adopted to estimate the status for prognosis. The open bearing dataset and established gear platform were utilized to validate the functionality and feasibility of the proposed model. Moreover, the experimental platform was used to simulate the gear mechanism of the semiconductor robot to conduct a practical experiment to verify the accuracy of the model estimation. The experimental results demonstrate the performance and effectiveness of the proposed method.


2019 ◽  
Vol 2019 (1) ◽  
pp. 143-148
Author(s):  
Futa Matsushita ◽  
Ryo Takahasshi ◽  
Mari Tsunomura ◽  
Norimichi Tsumura

3D reconstruction is used for inspection of industrial products. The demand for measuring 3D shapes is increased. There are many methods for 3D reconstruction using RGB images. However, it is difficult to reconstruct 3D shape using RGB images with gloss. In this paper, we use the deep neural network to remove the gloss from the image group captured by the RGB camera, and reconstruct the 3D shape with high accuracy than conventional method. In order to do the evaluation experiment, we use CG of simple shape and create images which changed geometry such as illumination direction. We removed gloss on these images and corrected defect parts after gloss removal for accurately estimating 3D shape. Finally, we compared 3D estimation using proposed method and conventional method by photo metric stereo. As a result, we show that the proposed method can estimate 3D shape more accurately than the conventional method.


Author(s):  
Zhizhong Han ◽  
Mingyang Shang ◽  
Xiyang Wang ◽  
Yu-Shen Liu ◽  
Matthias Zwicker

Jointly learning representations of 3D shapes and text is crucial to support tasks such as cross-modal retrieval or shape captioning. A recent method employs 3D voxels to represent 3D shapes, but this limits the approach to low resolutions due to the computational cost caused by the cubic complexity of 3D voxels. Hence the method suffers from a lack of detailed geometry. To resolve this issue, we propose Y2Seq2Seq, a view-based model, to learn cross-modal representations by joint reconstruction and prediction of view and word sequences. Specifically, the network architecture of Y2Seq2Seq bridges the semantic meaning embedded in the two modalities by two coupled “Y” like sequence-tosequence (Seq2Seq) structures. In addition, our novel hierarchical constraints further increase the discriminability of the cross-modal representations by employing more detailed discriminative information. Experimental results on cross-modal retrieval and 3D shape captioning show that Y2Seq2Seq outperforms the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document