scholarly journals HCNET: A Point Cloud Object Detection Network Based on Height and Channel Attention

2021 ◽  
Vol 13 (24) ◽  
pp. 5071
Author(s):  
Jing Zhang ◽  
Jiajun Wang ◽  
Da Xu ◽  
Yunsong Li

The use of LiDAR point clouds for accurate three-dimensional perception is crucial for realizing high-level autonomous driving systems. Upon considering the drawbacks of the current point cloud object-detection algorithms, this paper proposes HCNet, an algorithm that combines an attention mechanism with adaptive adjustment, starting from feature fusion and overcoming the sparse and uneven distribution of point clouds. Inspired by the basic idea of an attention mechanism, a feature-fusion structure HC module with height attention and channel attention, weighted in parallel, is proposed to perform feature-fusion on multiple pseudo images. The use of several weighting mechanisms enhances the ability of feature-information expression. Additionally, we designed an adaptively adjusted detection head that also overcomes the sparsity of the point cloud from the perspective of original information fusion. It reduces the interference caused by the uneven distribution of the point cloud from the perspective of adaptive adjustment. The results show that our HCNet has better accuracy than other one-stage-network or even two-stage-network RCNNs under some evaluation detection metrics. Additionally, it has a detection rate of 30FPS. Especially for hard samples, the algorithm in this paper has better detection performance than many existing algorithms.

Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4093 ◽  
Author(s):  
Jun Xu ◽  
Yanxin Ma ◽  
Songhua He ◽  
Jiahua Zhu

Three-dimensional (3D) object detection is an important research in 3D computer vision with significant applications in many fields, such as automatic driving, robotics, and human–computer interaction. However, the low precision is an urgent problem in the field of 3D object detection. To solve it, we present a framework for 3D object detection in point cloud. To be specific, a designed Backbone Network is used to make fusion of low-level features and high-level features, which makes full use of various information advantages. Moreover, the two-dimensional (2D) Generalized Intersection over Union is extended to 3D use as part of the loss function in our framework. Empirical experiments of Car, Cyclist, and Pedestrian detection have been conducted respectively on the KITTI benchmark. Experimental results with average precision (AP) have shown the effectiveness of the proposed network.


2021 ◽  
Vol 13 (22) ◽  
pp. 4497
Author(s):  
Jianjun Zou ◽  
Zhenxin Zhang ◽  
Dong Chen ◽  
Qinghua Li ◽  
Lan Sun ◽  
...  

Point cloud registration is the foundation and key step for many vital applications, such as digital city, autonomous driving, passive positioning, and navigation. The difference of spatial objects and the structure complexity of object surfaces are the main challenges for the registration problem. In this paper, we propose a graph attention capsule model (named as GACM) for the efficient registration of terrestrial laser scanning (TLS) point cloud in the urban scene, which fuses graph attention convolution and a three-dimensional (3D) capsule network to extract local point cloud features and obtain 3D feature descriptors. These descriptors can take into account the differences of spatial structure and point density in objects and make the spatial features of ground objects more prominent. During the training progress, we used both matched points and non-matched points to train the model. In the test process of the registration, the points in the neighborhood of each keypoint were sent to the trained network, in order to obtain feature descriptors and calculate the rotation and translation matrix after constructing a K-dimensional (KD) tree and random sample consensus (RANSAC) algorithm. Experiments show that the proposed method achieves more efficient registration results and higher robustness than other frontier registration methods in the pairwise registration of point clouds.


Electronics ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 927
Author(s):  
Zhiyu Wang ◽  
Li Wang ◽  
Liang Xiao ◽  
Bin Dai

Three-dimensional object detection based on the LiDAR point cloud plays an important role in autonomous driving. The point cloud distribution of the object varies greatly at different distances, observation angles, and occlusion levels. Besides, different types of LiDARs have different settings of projection angles, thus producing an entirely different point cloud distribution. Pre-trained models on the dataset with annotations may degrade on other datasets. In this paper, we propose a method for object detection using an unsupervised adaptive network, which does not require additional annotation data of the target domain. Our object detection adaptive network consists of a general object detection network, a global feature adaptation network, and a special subcategory instance adaptation network. We divide the source domain data into different subcategories and use a multi-label discriminator to assign labels dynamically to the target domain data. We evaluated our approach on the KITTI object benchmark and proved that the proposed unsupervised adaptive method could achieve a remarkable improvement in the adaptation capabilities.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6927
Author(s):  
Qingsheng Chen ◽  
Cien Fan ◽  
Weizheng Jin ◽  
Lian Zou ◽  
Fangyu Li ◽  
...  

Three-dimensional object detection from point cloud data is becoming more and more significant, especially for autonomous driving applications. However, it is difficult for lidar to obtain the complete structure of an object in a real scene due to its scanning characteristics. Although the existing methods have made great progress, most of them ignore the prior information of object structure, such as symmetry. So, in this paper, we use the symmetry of the object to complete the missing part in the point cloud and then detect it. Specifically, we propose a two-stage detection framework. In the first stage, we adopt an encoder–decoder structure to generate the symmetry points of the foreground points and make the symmetry points and the non-empty voxel centers form an enhanced point cloud. In the second stage, the enhanced point cloud is input into the baseline, which is an anchor-based region proposal network, to generate the detection results. Extensive experiments on the challenging KITTI benchmark show the effectiveness of our method, which has better performance on both 3D and BEV (bird’s eye view) object detection compared with some previous state-of-the-art methods.


Signals ◽  
2021 ◽  
Vol 2 (1) ◽  
pp. 98-107
Author(s):  
Yiran Li ◽  
Han Xie ◽  
Hyunchul Shin

Three-dimensional (3D) object detection is essential in autonomous driving. Three-dimensional (3D) Lidar sensor can capture three-dimensional objects, such as vehicles, cycles, pedestrians, and other objects on the road. Although Lidar can generate point clouds in 3D space, it still lacks the fine resolution of 2D information. Therefore, Lidar and camera fusion has gradually become a practical method for 3D object detection. Previous strategies focused on the extraction of voxel points and the fusion of feature maps. However, the biggest challenge is in extracting enough edge information to detect small objects. To solve this problem, we found that attention modules are beneficial in detecting small objects. In this work, we developed Frustum ConvNet and attention modules for the fusion of images from a camera and point clouds from a Lidar. Multilayer Perceptron (MLP) and tanh activation functions were used in the attention modules. Furthermore, the attention modules were designed on PointNet to perform multilayer edge detection for 3D object detection. Compared with a previous well-known method, Frustum ConvNet, our method achieved competitive results, with an improvement of 0.27%, 0.43%, and 0.36% in Average Precision (AP) for 3D object detection in easy, moderate, and hard cases, respectively, and an improvement of 0.21%, 0.27%, and 0.01% in AP for Bird’s Eye View (BEV) object detection in easy, moderate, and hard cases, respectively, on the KITTI detection benchmarks. Our method also obtained the best results in four cases in AP on the indoor SUN-RGBD dataset for 3D object detection.


2017 ◽  
Vol 14 (01) ◽  
pp. 1650028 ◽  
Author(s):  
Dimitrios Kanoulas ◽  
Jinoh Lee ◽  
Darwin G. Caldwell ◽  
Nikos G. Tsagarakis

Detecting affordances on objects is one of the main open problems in robotic manipulation. This paper presents a new method to represent and localize grasp affordances as bounded curved contact patches (paraboloids) of the size of the robotic hand. In particular, given a three-dimensional (3D) point cloud from a range sensor, a set of potential grasps is localized on a detected object by a fast contact patch fitting and validation process. For the object detection, three standard methods from the literature are used and compared. The potential grasps on the object are then refined to a single affordance using their shape (size and curvature) and pose (reachability and minimum torque effort) properties, with respect to the robot and the manipulation task. We apply the proposed method to a circular valve turning task, verifying the ability to accurately and rapidly localize grasp affordances, under significant uncertainty in the environment. We experimentally validate the method with the humanoid robot COMAN on 10 circular control valves fixed on a wall, from five different viewpoints and robot poses for each valve. We compare the reliability of the introduced local grasp affordances method to the baseline that relies only on object detection, illustrating the superiority of ours for the valve turning task.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 140
Author(s):  
Jinxuan Xu ◽  
Qian Xie ◽  
Honghua Chen ◽  
Jun Wang

Real-time consistent plane detection (RCPD) from structured point cloud sequences facilitates various high-level computer vision and robotic tasks. However, it remains a challenge. Existing techniques for plane detection suffer from a long running time or the problem that the plane detection result is not precise. Meanwhile, labels of planes are not consistent over the whole image sequence due to plane loss in the detection stage. In order to resolve these issues, we propose a novel superpixel-based real-time plane detection approach, while keeping their consistencies over frames simultaneously. In summary, our method has the following key contributions: (i) a real-time plane detection algorithm to extract planes from raw structured three-dimensional (3D) point clouds collected by depth sensors; (ii) a superpixel-based segmentation method to make the detected plane exactly match its actual boundary; and, (iii) a robust strategy to recover the missing planes by utilizing the contextual correspondences information in adjacent frames. Extensive visual and numerical experiments demonstrate that our method outperforms state-of-the-art methods in terms of efficiency and accuracy.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Wanyi Zhang ◽  
Xiuhua Fu ◽  
Wei Li

3D object detection based on point cloud data in the unmanned driving scene has always been a research hotspot in unmanned driving sensing technology. With the development and maturity of deep neural networks technology, the method of using neural network to detect three-dimensional object target begins to show great advantages. The experimental results show that the mismatch between anchor and training samples would affect the detection accuracy, but it has not been well solved. The contributions of this paper are as follows. For the first time, deformable convolution is introduced into the point cloud object detection network, which enhances the adaptability of the network to vehicles with different directions and shapes. Secondly, a new generation method of anchor in RPN is proposed, which can effectively prevent the mismatching between the anchor and ground truth and remove the angle classification loss in the loss function. Compared with the state-of-the-art method, the AP and AOS of the detection results are improved.


2021 ◽  
Vol 13 (15) ◽  
pp. 2868
Author(s):  
Yonglin Tian ◽  
Xiao Wang ◽  
Yu Shen ◽  
Zhongzheng Guo ◽  
Zilei Wang ◽  
...  

Three-dimensional information perception from point clouds is of vital importance for improving the ability of machines to understand the world, especially for autonomous driving and unmanned aerial vehicles. Data annotation for point clouds is one of the most challenging and costly tasks. In this paper, we propose a closed-loop and virtual–real interactive point cloud generation and model-upgrading framework called Parallel Point Clouds (PPCs). To our best knowledge, this is the first time that the training model has been changed from an open-loop to a closed-loop mechanism. The feedback from the evaluation results is used to update the training dataset, benefiting from the flexibility of artificial scenes. Under the framework, a point-based LiDAR simulation model is proposed, which greatly simplifies the scanning operation. Besides, a group-based placing method is put forward to integrate hybrid point clouds, via locating candidate positions for virtual objects in real scenes. Taking advantage of the CAD models and mobile LiDAR devices, two hybrid point cloud datasets, i.e., ShapeKITTI and MobilePointClouds, are built for 3D detection tasks. With almost zero labor cost on data annotation for newly added objects, the models (PointPillars) trained with ShapeKITTI and MobilePointClouds achieved 78.6% and 60.0% of the average precision of the model trained with real data on 3D detection, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 884
Author(s):  
Chia-Ming Tsai ◽  
Yi-Horng Lai ◽  
Yung-Da Sun ◽  
Yu-Jen Chung ◽  
Jau-Woei Perng

Numerous sensors can obtain images or point cloud data on land, however, the rapid attenuation of electromagnetic signals and the lack of light in water have been observed to restrict sensing functions. This study expands the utilization of two- and three-dimensional detection technologies in underwater applications to detect abandoned tires. A three-dimensional acoustic sensor, the BV5000, is used in this study to collect underwater point cloud data. Some pre-processing steps are proposed to remove noise and the seabed from raw data. Point clouds are then processed to obtain two data types: a 2D image and a 3D point cloud. Deep learning methods with different dimensions are used to train the models. In the two-dimensional method, the point cloud is transferred into a bird’s eye view image. The Faster R-CNN and YOLOv3 network architectures are used to detect tires. Meanwhile, in the three-dimensional method, the point cloud associated with a tire is cut out from the raw data and is used as training data. The PointNet and PointConv network architectures are then used for tire classification. The results show that both approaches provide good accuracy.


Sign in / Sign up

Export Citation Format

Share Document