scholarly journals An Improved Object Detection Method using Feature Map Refinement and Anchor Optimization

Author(s):  
Yuxia Wang ◽  
Wenzhu Yang ◽  
Tongtong Yuan ◽  
Qian Li

Lower detection accuracy and insufficient detection ability for small objects are the main problems of the region-free object detection algorithm. Aiming at solving the abovementioned problems, an improved object detection method using feature map refinement and anchor optimization is proposed. Firstly, the reverse fusion operation is performed on each of the object detection layer, which can provide the lower layers with more semantic information by the fusion of detection features at different levels. Secondly, the self-attention module is used to refine each detection feature map, calibrates the features between channels, and enhances the expression ability of local features. In addition, the anchor optimization model is introduced on each feature layer associated with anchors, and the anchors with higher probability of containing an object and more closely match the location and size of the object are obtained. In this model, semantic features are used to confirm and remove negative anchors to reduce search space of the objects, and preliminary adjustments are made to the locations and sizes of anchors. Comprehensive experimental results on PASCAL VOC detection dataset demonstrate the effectiveness of the proposed method. In particular, with VGG-16 and lower dimension 300×300 input size, the proposed method achieves a mAP of 79.1% on VOC 2007 test set with an inference speed of 24.7 milliseconds per image.

2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Guo X. Hu ◽  
Zhong Yang ◽  
Lei Hu ◽  
Li Huang ◽  
Jia M. Han

The existing object detection algorithm based on the deep convolution neural network needs to carry out multilevel convolution and pooling operations to the entire image in order to extract a deep semantic features of the image. The detection models can get better results for big object. However, those models fail to detect small objects that have low resolution and are greatly influenced by noise because the features after repeated convolution operations of existing models do not fully represent the essential characteristics of the small objects. In this paper, we can achieve good detection accuracy by extracting the features at different convolution levels of the object and using the multiscale features to detect small objects. For our detection model, we extract the features of the image from their third, fourth, and 5th convolutions, respectively, and then these three scales features are concatenated into a one-dimensional vector. The vector is used to classify objects by classifiers and locate position information of objects by regression of bounding box. Through testing, the detection accuracy of our model for small objects is 11% higher than the state-of-the-art models. In addition, we also used the model to detect aircraft in remote sensing images and achieved good results.


2021 ◽  
Vol 11 (13) ◽  
pp. 6016
Author(s):  
Jinsoo Kim ◽  
Jeongho Cho

For autonomous vehicles, it is critical to be aware of the driving environment to avoid collisions and drive safely. The recent evolution of convolutional neural networks has contributed significantly to accelerating the development of object detection techniques that enable autonomous vehicles to handle rapid changes in various driving environments. However, collisions in an autonomous driving environment can still occur due to undetected obstacles and various perception problems, particularly occlusion. Thus, we propose a robust object detection algorithm for environments in which objects are truncated or occluded by employing RGB image and light detection and ranging (LiDAR) bird’s eye view (BEV) representations. This structure combines independent detection results obtained in parallel through “you only look once” networks using an RGB image and a height map converted from the BEV representations of LiDAR’s point cloud data (PCD). The region proposal of an object is determined via non-maximum suppression, which suppresses the bounding boxes of adjacent regions. A performance evaluation of the proposed scheme was performed using the KITTI vision benchmark suite dataset. The results demonstrate the detection accuracy in the case of integration of PCD BEV representations is superior to when only an RGB camera is used. In addition, robustness is improved by significantly enhancing detection accuracy even when the target objects are partially occluded when viewed from the front, which demonstrates that the proposed algorithm outperforms the conventional RGB-based model.


2021 ◽  
Author(s):  
◽  
Ibrahim Mohammad Hussain Rahman

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection.  Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet.  For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis.  The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency:  1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features.  2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects.  3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system.  4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map.  5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps.  The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>


Recognition and detection of an object in the watched scenes is a characteristic organic capacity. Animals and human being play out this easily in day by day life to move without crashes, to discover sustenance, dodge dangers, etc. Be that as it may, comparable PC techniques and calculations for scene examination are not all that direct, in spite of their exceptional advancement. Object detection is the process in which finding or recognizing cases of articles (for instance faces, mutts or structures) in computerized pictures or recordings. This is the fundamental task in computer. For detecting the instance of an object and to pictures having a place with an article classification object detection method usually used learning algorithm and extracted features. This paper proposed a method for moving object detection and vehicle detection.


Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1235
Author(s):  
Yang Yang ◽  
Hongmin Deng

In order to make the classification and regression of single-stage detectors more accurate, an object detection algorithm named Global Context You-Only-Look-Once v3 (GC-YOLOv3) is proposed based on the You-Only-Look-Once (YOLO) in this paper. Firstly, a better cascading model with learnable semantic fusion between a feature extraction network and a feature pyramid network is designed to improve detection accuracy using a global context block. Secondly, the information to be retained is screened by combining three different scaling feature maps together. Finally, a global self-attention mechanism is used to highlight the useful information of feature maps while suppressing irrelevant information. Experiments show that our GC-YOLOv3 reaches a maximum of 55.5 object detection mean Average Precision (mAP)@0.5 on Common Objects in Context (COCO) 2017 test-dev and that the mAP is 5.1% higher than that of the YOLOv3 algorithm on Pascal Visual Object Classes (PASCAL VOC) 2007 test set. Therefore, experiments indicate that the proposed GC-YOLOv3 model exhibits optimal performance on the PASCAL VOC and COCO datasets.


Author(s):  
Seung-Hwan Bae

Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on convolutional neural networks (CNNs) have been flourishing and provided the promising detection results. However, the detection accuracy is degraded often because of the low discriminability of object CNN features caused by occlusions and inaccurate region proposals. In this paper, we therefore propose a region decomposition and assembly detector (R-DAD) for more accurate object detection.In the proposed R-DAD, we first decompose an object region into multiple small regions. To capture an entire appearance and part details of the object jointly, we extract CNN features within the whole object region and decomposed regions. We then learn the semantic relations between the object and its parts by combining the multi-region features stage by stage with region assembly blocks, and use the combined and high-level semantic features for the object classification and localization. In addition, for more accurate region proposals, we propose a multi-scale proposal layer that can generate object proposals of various scales. We integrate the R-DAD into several feature extractors, and prove the distinct performance improvement on PASCAL07/12 and MSCOCO18 compared to the recent convolutional detectors.


2020 ◽  
Vol 143 (4) ◽  
Author(s):  
Tie Zhang ◽  
Peizhong Ge ◽  
Yanbiao Zou ◽  
Yingwu He

Abstract To ensure the human safety in the process of human–robot cooperation, this paper proposes a robot collision detection method without external sensors based on time-series analysis (TSA). In the investigation, first, based on the characteristics of the external torque of the robot, the internal variation of the external torque sequence during the movement of the robot is analyzed. Next, a time-series model of the external torque is constructed, which is used to predict the external torque according to the historical motion information of the robot and generate a dynamic threshold. Then, the detailed process of time-series analysis for collision detection is described. Finally, the real-machine experiment scheme of the proposed real-time collision detection algorithm is designed and is used to perform experiments with a six degrees-of-freedom (6DOF) articulated industrial robot. The results show that the proposed method helps to obtain a detection accuracy of 100%; and that, as compared with the existing collision detection method based on a fixed symmetric threshold, the proposed method based on TSA possesses smaller detection delay and is more feasible in eliminating the sensitivity difference of collision detection in different directions.


2020 ◽  
pp. 1-16
Author(s):  
Ling Zhang ◽  
Yan Zhuang ◽  
Zhan Hua ◽  
Lin Han ◽  
Cheng Li ◽  
...  

BACKGROUND: Thyroid ultrasonography is widely used to diagnose thyroid nodules in clinics. Automatic localization of nodules can promote the development of intelligent thyroid diagnosis and reduce workload of radiologists. However, besides the ultrasound image has low contrast and high noise, the thyroid nodules are diverse in shape and vary greatly in size. Thus, thyroid nodule detection in ultrasound images is still a challenging task. OBJECTIVE: This study proposes an automatic detection algorithm to locate nodules in B ultrasound images and Doppler ultrasound images. This method can be used to screen thyroid nodules and provide a basis for subsequent automatic segmentation and intelligent diagnosis. METHODS: We develop and optimize an improved YOLOV3 model for detecting thyroid nodules in ultrasound images with B-mode and Doppler mode. Improvements include (1) using the high-resolution network (HRNet) as the basic network for gradually extracting high-level semantic features to reduce the missed detection and misdetection, (2) optimizing the loss function for single target detection like nodules, and (3) obtaining the anchor boxes by clustering the candidate frames of real nodules in the dataset. RESULTS: The experimental results of applying to 8000 clinical ultrasound images show that the new method developed and tested in this study can effectively detect thyroid nodules. The method achieves 94.53% mean precision and 95.00% mean recall. CONCLUTIONS: The study demonstrates a new automated method that enables to achieve high detection accuracy and effectively locate thyroid nodules in various ultrasound images without any user interaction, which indicates its potential clinical application value for the thyroid nodule screening.


2021 ◽  
Author(s):  
Zhenyu Wang ◽  
Senrong Ji ◽  
Duokun Yin

Abstract Recently, using image sensing devices to analyze air quality has attracted much attention of researchers. To keep real-time factory smoke under universal social supervision, this paper proposes a mobile-platform-running efficient smoke detection algorithm based on image analysis techniques. Since most smoke images in real scenes have challenging variances, it’s difficult for existing object detection methods. To this end, we introduce the two-stage smoke detection (TSSD) algorithm based on the lightweight framework, in which the prior knowledge and contextual information are modeled into the relation-guided module to reduce the smoke search space, which can therefore significantly improve the shortcomings of the single-stage method. Experimental results show that the TSSD algorithm can robustly improve the detection accuracy of the single-stage method and has good compatibility for different image resolution inputs. Compared with various state-of-the-art detection methods, the accuracy AP mean of the TSSD model reaches 59.24%, even surpassing the current detection model Faster R-CNN. In addition, the detection speed of our proposed model can reach 50 ms (20 FPS), which meets the real-time requirements, and can be deployed in the mobile terminal carrier. This model can be widely used in some scenes with smoke detection requirements, providing great potential for practical environmental applications.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0255135
Author(s):  
Chunming Wu ◽  
Xin Ma ◽  
Xiangxu Kong ◽  
Haichao Zhu

The reliability of the insulator has directly affected the stable operation of electric power system. The detection of defective insulators has always been an important issue in smart grid systems. However, the traditional transmission line detection method has low accuracy and poor real-time performance. We present an insulator defect detection method based on CenterNet. In order to improve detection efficiency, we simplified the backbone network. In addition, an attention mechanism is utilized to suppress useless information and improve the accuracy of network detection. In image preprocessing, the blurring of some detected images results in the samples being discarded, so we use super-resolution reconstruction algorithm to reconstruct the blurred images to enhance the dataset. The results show that the AP of the proposed method reaches 96.16% and the reasoning speed reaches 30FPS under the test condition of NVIDIA GTX 1080 test conditions. Compared with Faster R-CNN, YOLOV3, RetinaNet and FSAF, the detection accuracy of proposed method is greatly improved, which fully proves the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document