scholarly journals Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-Based Sequence to Sequence Network

Author(s):  
Xinhai Liu ◽  
Zhizhong Han ◽  
Yu-Shen Liu ◽  
Matthias Zwicker

Exploring contextual information in the local region is important for shape understanding and analysis. Existing studies often employ hand-crafted or explicit ways to encode contextual information of local regions. However, it is hard to capture fine-grained contextual information in hand-crafted or explicit manners, such as the correlation between different areas in a local region, which limits the discriminative ability of learned features. To resolve this issue, we propose a novel deep learning model for 3D point clouds, named Point2Sequence, to learn 3D shape features by capturing fine-grained contextual information in a novel implicit way. Point2Sequence employs a novel sequence learning model for point clouds to capture the correlations by aggregating multi-scale areas of each local region with attention. Specifically, Point2Sequence first learns the feature of each area scale in a local region. Then, it captures the correlation between area scales in the process of aggregating all area scales using a recurrent neural network (RNN) based encoder-decoder structure, where an attention mechanism is proposed to highlight the importance of different area scales. Experimental results show that Point2Sequence achieves state-of-the-art performance in shape classification and segmentation tasks.

2021 ◽  
Vol 13 (15) ◽  
pp. 3021
Author(s):  
Bufan Zhao ◽  
Xianghong Hua ◽  
Kegen Yu ◽  
Xiaoxing He ◽  
Weixing Xue ◽  
...  

Urban object segmentation and classification tasks are critical data processing steps in scene understanding, intelligent vehicles and 3D high-precision maps. Semantic segmentation of 3D point clouds is the foundational step in object recognition. To identify the intersecting objects and improve the accuracy of classification, this paper proposes a segment-based classification method for 3D point clouds. This method firstly divides points into multi-scale supervoxels and groups them by proposed inverse node graph (IN-Graph) construction, which does not need to define prior information about the node, it divides supervoxels by judging the connection state of edges between them. This method reaches minimum global energy by graph cutting, obtains the structural segments as completely as possible, and retains boundaries at the same time. Then, the random forest classifier is utilized for supervised classification. To deal with the mislabeling of scattered fragments, higher-order CRF with small-label cluster optimization is proposed to refine the classification results. Experiments were carried out on mobile laser scan (MLS) point dataset and terrestrial laser scan (TLS) points dataset, and the results show that overall accuracies of 97.57% and 96.39% were obtained in the two datasets. The boundaries of objects were retained well, and the method achieved a good result in the classification of cars and motorcycles. More experimental analyses have verified the advantages of the proposed method and proved the practicability and versatility of the method.


2021 ◽  
Vol 14 (6) ◽  
pp. 863-863
Author(s):  
Supun Nakandala ◽  
Yuhao Zhang ◽  
Arun Kumar

We discovered that there was an inconsistency in the communication cost formulation for the decentralized fine-grained training method in Table 2 of our paper [1]. We used Horovod as the archetype for decentralized fine-grained approaches, and its correct communication cost is higher than what we had reported. So, we amend the communication cost of decentralized fine-grained to [EQUATION]


2020 ◽  
Vol 12 (3) ◽  
pp. 543 ◽  
Author(s):  
Małgorzata Jarząbek-Rychard ◽  
Dong Lin ◽  
Hans-Gerd Maas

Targeted energy management and control is becoming an increasing concern in the building sector. Automatic analyses of thermal data, which minimize the subjectivity of the assessment and allow for large-scale inspections, are therefore of high interest. In this study, we propose an approach for a supervised extraction of façade openings (windows and doors) from photogrammetric 3D point clouds attributed to RGB and thermal infrared (TIR) information. The novelty of the proposed approach is in the combination of thermal information with other available characteristics of data for a classification performed directly in 3D space. Images acquired in visible and thermal infrared spectra serve as input data for the camera pose estimation and the reconstruction of 3D scene geometry. To investigate the relevance of different information types to the classification performance, a Random Forest algorithm is applied to various sets of computed features. The best feature combination is then used as an input for a Conditional Random Field that enables us to incorporate contextual information and consider the interaction between the points. The evaluation executed on a per-point level shows that the fusion of all available information types together with context consideration allows us to extract objects with 90% completeness and 95% correctness. A respective assessment executed on a per-object level shows 97% completeness and 88% accuracy.


Sensors ◽  
2014 ◽  
Vol 14 (12) ◽  
pp. 24156-24173 ◽  
Author(s):  
Min Lu ◽  
Yulan Guo ◽  
Jun Zhang ◽  
Yanxin Ma ◽  
Yinjie Lei

2017 ◽  
Vol 29 (5) ◽  
pp. 1209-1224 ◽  
Author(s):  
Baowei Lin ◽  
Fasheng Wang ◽  
Fangda Zhao ◽  
Yi Sun

2022 ◽  
Vol 14 (1) ◽  
pp. 27
Author(s):  
Junda Li ◽  
Chunxu Zhang ◽  
Bo Yang

Current two-stage object detectors extract the local visual features of Regions of Interest (RoIs) for object recognition and bounding-box regression. However, only using local visual features will lose global contextual dependencies, which are helpful to recognize objects with featureless appearances and restrain false detections. To tackle the problem, a simple framework, named Global Contextual Dependency Network (GCDN), is presented to enhance the classification ability of two-stage detectors. Our GCDN mainly consists of two components, Context Representation Module (CRM) and Context Dependency Module (CDM). Specifically, a CRM is proposed to construct multi-scale context representations. With CRM, contextual information can be fully explored at different scales. Moreover, the CDM is designed to capture global contextual dependencies. Our GCDN includes multiple CDMs. Each CDM utilizes local Region of Interest (RoI) features and single-scale context representation to generate single-scale contextual RoI features via the attention mechanism. Finally, the contextual RoI features generated by parallel CDMs independently are combined with the original RoI features to help classification. Experiments on MS-COCO 2017 benchmark dataset show that our approach brings continuous improvements for two-stage detectors.


Sign in / Sign up

Export Citation Format

Share Document