scholarly journals Scale-Sensitive Feature Reassembly Network for Pedestrian Detection

Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4189
Author(s):  
Xiaoting Yang ◽  
Qiong Liu

Serious scale variation is a key challenge in pedestrian detection. Most works typically employ a feature pyramid network to detect objects at diverse scales. Such a method suffers from information loss during channel unification. Inadequate sampling of the backbone network also affects the power of pyramidal features. Moreover, an arbitrary RoI (region of interest) allocation scheme of these detectors incurs coarse RoI representation, which becomes worse under the dilemma of small pedestrian relative scale (PRS). In this paper, we propose a novel scale-sensitive feature reassembly network (SSNet) for pedestrian detection in road scenes. Specifically, a multi-parallel branch sampling module is devised with flexible receptive fields and an adjustable anchor stride to improve the sensitivity to pedestrians imaged at multiple scales. Meanwhile, a context enhancement fusion module is also proposed to alleviate information loss by injecting various spatial context information into the original features. For more accurate prediction, an adaptive reassembly strategy is designed to obtain recognizable RoI features in the proposal refinement stage. Extensive experiments are conducted on CityPersons and Caltech datasets to demonstrate the effectiveness of our method. The detection results show that our SSNet surpasses the baseline method significantly by integrating lightweight modules and achieves competitive performance with other methods without bells and whistles.

Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1820
Author(s):  
Xiaotao Shao ◽  
Qing Wang ◽  
Wei Yang ◽  
Yun Chen ◽  
Yi Xie ◽  
...  

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.


2012 ◽  
Vol 542-543 ◽  
pp. 937-940
Author(s):  
Ping Shu Ge ◽  
Guo Kai Xu ◽  
Xiu Chun Zhao ◽  
Peng Song ◽  
Lie Guo

To locate pedestrian faster and more accurately, a pedestrian detection method based on histograms of oriented gradients (HOG) in region of interest (ROI) is introduced. The features are extracted in the ROI where the pedestrian's legs may exist, which is helpful to decrease the dimension of feature vector and simplify the calculation. Then the vertical edge symmetry of pedestrian's legs is fused to confirm the detection. Experimental results indicate that this method can achieve an ideal accuracy with lower process time compared to traditional method.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2815
Author(s):  
Shih-Hung Yang ◽  
Yao-Mao Cheng ◽  
Jyun-We Huang ◽  
Yon-Ping Chen

Automatic fingerspelling recognition tackles the communication barrier between deaf and hearing individuals. However, the accuracy of fingerspelling recognition is reduced by high intra-class variability and low inter-class variability. In the existing methods, regular convolutional kernels, which have limited receptive fields (RFs) and often cannot detect subtle discriminative details, are applied to learn features. In this study, we propose a receptive field-aware network with finger attention (RFaNet) that highlights the finger regions and builds inter-finger relations. To highlight the discriminative details of these fingers, RFaNet reweights the low-level features of the hand depth image with those of the non-forearm image and improves finger localization, even when the wrist is occluded. RFaNet captures neighboring and inter-region dependencies between fingers in high-level features. An atrous convolution procedure enlarges the RFs at multiple scales and a non-local operation computes the interactions between multi-scale feature maps, thereby facilitating the building of inter-finger relations. Thus, the representation of a sign is invariant to viewpoint changes, which are primarily responsible for intra-class variability. On an American Sign Language fingerspelling dataset, RFaNet achieved 1.77% higher classification accuracy than state-of-the-art methods. RFaNet achieved effective transfer learning when the number of labeled depth images was insufficient. The fingerspelling representation of a depth image can be effectively transferred from large- to small-scale datasets via highlighting the finger regions and building inter-finger relations, thereby reducing the requirement for expensive fingerspelling annotations.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Zhengchao Dong ◽  
Feng Liu ◽  
Alayar Kangarlu ◽  
Bradley S. Peterson

Multisection magnetic resonance spectroscopic imaging is a widely used pulse sequence that has distinct advantages over other spectroscopic imaging sequences, such as dynamic shimming, large region-of-interest coverage within slices, and rapid data acquisition. It has limitations, however, in the number of slices that can be acquired in realistic scan times and information loss from spacing between slices. In this paper, we synergize the multi-section spectroscopic imaging pulse sequence with multichannel coil technology to overcome these limitations. These combined techniques now permit elimination of the gaps between slices and acquisition of a larger number of slices to realize the whole brain metabolite mapping without incurring the penalties of longer repetition times (and therefore longer acquisition times) or lower signal-to-noise ratios.


2021 ◽  
Author(s):  
Rinju Alice John

Nowadays, People are more distracted by their vulnerable devices, whenever they enter a cross road. As a result, a fatal accident or injury will occur. This motivated the need to implement a reliable pedestrian detection system. To optimize the system, a cross road scenario is considered where the driver is taking a right turn and a smart camera is used to capture consecutive pictures of the pedestrian. The consecutive frames are studied using Region Of Interest method and the Gaussian mixture model method. Once the detected pedestrian enters region of interest in less than 2 meters, a warning and automatic brake system is initiated to prevent the accident. Finally, the results of the proposed methods are compared based on the processing speed and performance rate of the Shape based detection technique (Wei Zhang, [12]). The performance rate was above 90% and processing speed was about 1 sec for the proposed methods.


Sensors ◽  
2019 ◽  
Vol 19 (5) ◽  
pp. 1089 ◽  
Author(s):  
Ye Wang ◽  
Zhenyi Liu ◽  
Weiwen Deng

Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.


Sign in / Sign up

Export Citation Format

Share Document