scholarly journals Multiscale Receptive Fields Graph Attention Network for Point Cloud Classification

Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xi-An Li ◽  
Li-Yan Wang ◽  
Jian Lu

Understanding the implication of point cloud is still challenging in the aim of classification or segmentation for point cloud due to its irregular and sparse structure. As we have known, PointNet architecture as a ground-breaking work for point cloud process can learn shape features directly on unordered 3D point cloud and has achieved favorable performance, such as 86% mean accuracy and 89.2% overall accuracy for classification task, respectively. However, this model fails to consider the fine-grained semantic information of local structure for point cloud. Then, a multiscale receptive fields graph attention network (named after MRFGAT) by means of semantic features of local patch for point cloud is proposed in this paper, and the learned feature map for our network can well capture the abundant features information of point cloud. The proposed MRFGAT architecture is tested on ModelNet datasets, and results show it achieves state-of-the-art performance in shape classification tasks, such as it outperforms GAPNet (Chen et al.) model by 0.1% in terms of OA and compete with DGCNN (Wang et al.) model in terms of MA.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5279
Author(s):  
Yang Li ◽  
Huahu Xu ◽  
Junsheng Xiao

Language-based person search retrieves images of a target person using natural language description and is a challenging fine-grained cross-modal retrieval task. A novel hybrid attention network is proposed for the task. The network includes the following three aspects: First, a cubic attention mechanism for person image, which combines cross-layer spatial attention and channel attention. It can fully excavate both important midlevel details and key high-level semantics to obtain better discriminative fine-grained feature representation of a person image. Second, a text attention network for language description, which is based on bidirectional LSTM (BiLSTM) and self-attention mechanism. It can better learn the bidirectional semantic dependency and capture the key words of sentences, so as to extract the context information and key semantic features of the language description more effectively and accurately. Third, a cross-modal attention mechanism and a joint loss function for cross-modal learning, which can pay more attention to the relevant parts between text and image features. It can better exploit both the cross-modal and intra-modal correlation and can better solve the problem of cross-modal heterogeneity. Extensive experiments have been conducted on the CUHK-PEDES dataset. Our approach obtains higher performance than state-of-the-art approaches, demonstrating the advantage of the approach we propose.



Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5455
Author(s):  
Yufeng Yang ◽  
Yixiao Ma ◽  
Jing Zhang ◽  
Xin Gao ◽  
Min Xu

Point set is a major type of 3D structure representation format characterized by its data availability and compactness. Most former deep learning-based point set models pay equal attention to different point set regions and channels, thus having limited ability in focusing on small regions and specific channels that are important for characterizing the object of interest. In this paper, we introduce a novel model named Attention-based Point Network (AttPNet). It uses attention mechanism for both global feature masking and channel weighting to focus on characteristic regions and channels. There are two branches in our model. The first branch calculates an attention mask for every point. The second branch uses convolution layers to abstract global features from point sets, where channel attention block is adapted to focus on important channels. Evaluations on the ModelNet40 benchmark dataset show that our model outperforms the existing best model in classification tasks by 0.7% without voting. In addition, experiments on augmented data demonstrate that our model is robust to rotational perturbations and missing points. We also design a Electron Cryo-Tomography (ECT) point cloud dataset and further demonstrate our model’s ability in dealing with fine-grained structures on the ECT dataset.



Author(s):  
Zhen-Liang Ni ◽  
Gui-Bin Bian ◽  
Guan-An Wang ◽  
Xiao-Hu Zhou ◽  
Zeng-Guang Hou ◽  
...  

Surgical instrument segmentation is crucial for computer-assisted surgery. Different from common object segmentation, it is more challenging due to the large illumination variation and scale variation in the surgical scenes. In this paper, we propose a bilinear attention network with adaptive receptive fields to address these two issues. To deal with the illumination variation, the bilinear attention module models global contexts and semantic dependencies between pixels by capturing second-order statistics. With them, semantic features in challenging areas can be inferred from their neighbors, and the distinction of various semantics can be boosted. To adapt to the scale variation, our adaptive receptive field module aggregates multi-scale features and selects receptive fields adaptively. Specifically, it models the semantic relationships between channels to choose feature maps with appropriate scales, changing the receptive field of subsequent convolutions. The proposed network achieves the best performance 97.47% mean IoU on Cata7. It also takes the first place on EndoVis 2017, exceeding the second place by 10.10% mean IoU.



2021 ◽  
pp. 1-16
Author(s):  
Ma Qihang ◽  
Zh Jian ◽  
Zhang Jiahao

Local information coding helps capture the fine-grained features of the point cloud. The point cloud coding mechanism should be applicable to the point cloud data in different formats. However, the local features of the point cloud are directly affected by the attributes, size and scale of the object. This paper proposes an Adaptive Locally-Coded point cloud classification and segmentation Network coupled with Genetic Algorithm(ALCN-GA), which can automatically adjust the size of search cube to complete network training. ALCN-GA can adapt to the features of 3D data at different points, whose adjustment mechanism is realized by designing a robust crossover and mutation strategy. The proposed method is tested on the ModelNet40 dataset and S3DIS dataset. Respectively, the overall accuracy and average accuracy is 89.5% and 86.5% in classification, and overall accuracy and mIoU of segmentation is 80.34% and 51.05%. Compared with PointNet, average accuracy in classification and mIoU of segmentation is improved about 10% and 11% severally.



Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3281
Author(s):  
Xu He ◽  
Yong Yin

Recently, deep learning-based techniques have shown great power in image inpainting especially dealing with squared holes. However, they fail to generate plausible results inside the missing regions for irregular and large holes as there is a lack of understanding between missing regions and existing counterparts. To overcome this limitation, we combine two non-local mechanisms including a contextual attention module (CAM) and an implicit diversified Markov random fields (ID-MRF) loss with a multi-scale architecture which uses several dense fusion blocks (DFB) based on the dense combination of dilated convolution to guide the generative network to restore discontinuous and continuous large masked areas. To prevent color discrepancies and grid-like artifacts, we apply the ID-MRF loss to improve the visual appearance by comparing similarities of long-distance feature patches. To further capture the long-term relationship of different regions in large missing regions, we introduce the CAM. Although CAM has the ability to create plausible results via reconstructing refined features, it depends on initial predicted results. Hence, we employ the DFB to obtain larger and more effective receptive fields, which benefits to predict more precise and fine-grained information for CAM. Extensive experiments on two widely-used datasets demonstrate that our proposed framework significantly outperforms the state-of-the-art approaches both in quantity and quality.





2021 ◽  
Vol 13 (10) ◽  
pp. 1985
Author(s):  
Emre Özdemir ◽  
Fabio Remondino ◽  
Alessandro Golkar

With recent advances in technologies, deep learning is being applied more and more to different tasks. In particular, point cloud processing and classification have been studied for a while now, with various methods developed. Some of the available classification approaches are based on specific data source, like LiDAR, while others are focused on specific scenarios, like indoor. A general major issue is the computational efficiency (in terms of power consumption, memory requirement, and training/inference time). In this study, we propose an efficient framework (named TONIC) that can work with any kind of aerial data source (LiDAR or photogrammetry) and does not require high computational power while achieving accuracy on par with the current state of the art methods. We also test our framework for its generalization ability, showing capabilities to learn from one dataset and predict on unseen aerial scenarios.



2021 ◽  
Vol 151 ◽  
pp. 180-186
Author(s):  
Ruibin Gu ◽  
Qiuxia Wu ◽  
Wing W.Y. Ng ◽  
Hongbin Xu ◽  
Zhiyong Wang


Sign in / Sign up

Export Citation Format

Share Document