scholarly journals Optimized loss functions for object detection and application on nighttime vehicle detection

Author(s):  
Shang Jiang ◽  
Haoran Qin ◽  
Bingli Zhang ◽  
Jieyu Zheng

The loss function is a crucial factor that affects the detection precision in the object detection task. In this paper, we optimize both two loss functions for classification and localization simultaneously. Firstly, we reconstruct the classification loss function by combining the prediction results of localization, aiming to establish the correlation between localization and classification subnetworks. Compared to the existing studies, in which the correlation is only established among the positive samples and applied to improve the localization accuracy of predicted boxes, this paper utilizes the correlation to define the hard negative samples and then puts emphasis on the classification of them. Thus the whole misclassified rate for negative samples can be reduced. Besides, a novel localization loss named MIoU is proposed by incorporating a Mahalanobis distance between the predicted box and target box, eliminating the gradients inconsistency problem in the DIoU loss, further improving the localization accuracy. Finally, the proposed methods are applied to train the networks for nighttime vehicle detection. Experimental results show that the detection accuracy can be outstandingly improved with our proposed loss functions without hurting the detection speed.

2021 ◽  
Vol 104 (2) ◽  
pp. 003685042110113
Author(s):  
Xianghua Ma ◽  
Zhenkun Yang

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.


2021 ◽  
Vol 13 (10) ◽  
pp. 1925
Author(s):  
Shengzhou Xiong ◽  
Yihua Tan ◽  
Yansheng Li ◽  
Cai Wen ◽  
Pei Yan

Object detection in remote sensing images (RSIs) is one of the basic tasks in the field of remote sensing image automatic interpretation. In recent years, the deep object detection frameworks of natural scene images (NSIs) have been introduced into object detection on RSIs, and the detection performance has improved significantly because of the powerful feature representation. However, there are still many challenges concerning the particularities of remote sensing objects. One of the main challenges is the missed detection of small objects which have less than five percent of the pixels of the big objects. Generally, the existing algorithms choose to deal with this problem by multi-scale feature fusion based on a feature pyramid. However, the benefits of this strategy are limited, considering that the location of small objects in the feature map will disappear when the detection task is processed at the end of the network. In this study, we propose a subtask attention network (StAN), which handles the detection task directly on the shallow layer of the network. First, StAN contains one shared feature branch and two subtask attention branches of a semantic auxiliary subtask and a detection subtask based on the multi-task attention network (MTAN). Second, the detection branch uses only low-level features considering small objects. Third, the attention map guidance mechanism is put forward to optimize the network for keeping the identification ability. Fourth, the multi-dimensional sampling module (MdS), global multi-view channel weights (GMulW) and target-guided pixel attention (TPA) are designed for further improvement of the detection accuracy in complex scenes. The experimental results on the NWPU VHR-10 dataset and DOTA dataset demonstrated that the proposed algorithm achieved the SOTA performance, and the missed detection of small objects decreased. On the other hand, ablation experiments also proved the effects of MdS, GMulW and TPA.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7422
Author(s):  
Xiaowei He ◽  
Rao Cheng ◽  
Zhonglong Zheng ◽  
Zeji Wang

In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is utilized to improve loss function for promoting the positioning accuracy of the small object. In order to reduce the complexity of the model, we present a lightweight yet powerful backbone network (named SA-MobileNeXt) that incorporates channel and spatial attention. Our approach can extract expressive features more effectively by applying the Shuffle Channel and Spatial Attention (SCSA) module into the SandGlass Block (SGBlock) module while increasing the parameters by a small number. In addition, the data enhancement method combining Mosaic and Mixup is employed to improve the robustness of the training model. The Multi-scale Feature Enhancement Fusion (MFEF) network is proposed to fuse the extracted features better. In addition, the SiLU activation function is utilized to optimize the Convolution-Batchnorm-Leaky ReLU (CBL) module and the SGBlock module to accelerate the convergence of the model. The ablation experiments on the KITTI dataset show that each improved method is effective. The improved algorithm reduces the complexity and detection speed of the model while improving the object detection accuracy. The comparative experiments on the KITTY dataset and CCTSDB dataset with other algorithms show that our algorithm also has certain advantages.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Zhibin Cheng ◽  
Fuquan Zhang

In this paper, a novel flower detection application anchor-based method is proposed, which is combined with an attention mechanism to detect the flowers in a smart garden in AIoT more accurately and fast. While many researchers have paid much attention to the flower classification in existing studies, the issue of flower detection has been largely overlooked. The problem we have outlined deals largely with the study of a new design and application of flower detection. Firstly, a new end-to-end flower detection anchor-based method is inserted into the architecture of the network to make it more precious and fast and the loss function and attention mechanism are introduced into our model to suppress unimportant features. Secondly, our flower detection algorithms can be integrated into the mobile device. It is revealed that our flower detection method is very considerable through a series of investigations carried out. The detection accuracy of our method is similar to that of the state-of-the-art, and the detection speed is faster at the same time. It makes a major contribution to flower detection in computer vision.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Teng Liu ◽  
Cheng Xu ◽  
Hongzhe Liu ◽  
Xuewei Li ◽  
Pengfei Wang

Security perception systems based on 5G-V2X have become an indispensable part of smart city construction. However, the detection speed of traditional deep learning models is slow, and the low-latency characteristics of 5G networks cannot be fully utilized. In order to improve the safety perception ability based on 5G-V2X, increase the detection speed in vehicle perception. A vehicle perception model is proposed. First, an adaptive feature extraction method is adopted to enhance the expression of small-scale features and improve the feature extraction ability of small-scale targets. Then, by improving the feature fusion method, the shallow information is fused layer by layer to solve the problem of feature loss. Finally, the attention enhancement method is introduced to increase the center point prediction ability and solve the problem of target occlusion. The experimental results show that the UA-DETRAC data set has a good detection effect. Compared with the vehicle detection capability before the improvement, the detection accuracy and speed have been greatly improved, which effectively improves the security perception capability based on the 5G-V2X system, thereby promoting the construction of smart cities.


Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6485
Author(s):  
Delia-Georgiana Stuparu ◽  
Radu-Ioan Ciobanu ◽  
Ciprian Dobre

In order to improve the traffic in large cities and to avoid congestion, advanced methods of detecting and predicting vehicle behaviour are needed. Such methods require complex information regarding the number of vehicles on the roads, their positions, directions, etc. One way to obtain this information is by analyzing overhead images collected by satellites or drones, and extracting information from them through intelligent machine learning models. Thus, in this paper we propose and present a one-stage object detection model for finding vehicles in satellite images using the RetinaNet architecture and the Cars Overhead With Context dataset. By analyzing the results obtained by the proposed model, we show that it has a very good vehicle detection accuracy and a very low detection time, which shows that it can be employed to successfully extract data from real-time satellite or drone data.


Author(s):  
A. Howie ◽  
D.W. McComb

The bulk loss function Im(-l/ε (ω)), a well established tool for the interpretation of valence loss spectra, is being progressively adapted to the wide variety of inhomogeneous samples of interest to the electron microscopist. Proportionality between n, the local valence electron density, and ε-1 (Sellmeyer's equation) has sometimes been assumed but may not be valid even in homogeneous samples. Figs. 1 and 2 show the experimentally measured bulk loss functions for three pure silicates of different specific gravity ρ - quartz (ρ = 2.66), coesite (ρ = 2.93) and a zeolite (ρ = 1.79). Clearly, despite the substantial differences in density, the shift of the prominent loss peak is very small and far less than that predicted by scaling e for quartz with Sellmeyer's equation or even the somewhat smaller shift given by the Clausius-Mossotti (CM) relation which assumes proportionality between n (or ρ in this case) and (ε - 1)/(ε + 2). Both theories overestimate the rise in the peak height for coesite and underestimate the increase at high energies.


2006 ◽  
Vol 27 (4) ◽  
pp. 218-228 ◽  
Author(s):  
Paul Rodway ◽  
Karen Gillies ◽  
Astrid Schepman

This study examined whether individual differences in the vividness of visual imagery influenced performance on a novel long-term change detection task. Participants were presented with a sequence of pictures, with each picture and its title displayed for 17  s, and then presented with changed or unchanged versions of those pictures and asked to detect whether the picture had been changed. Cuing the retrieval of the picture's image, by presenting the picture's title before the arrival of the changed picture, facilitated change detection accuracy. This suggests that the retrieval of the picture's representation immunizes it against overwriting by the arrival of the changed picture. The high and low vividness participants did not differ in overall levels of change detection accuracy. However, in replication of Gur and Hilgard (1975) , high vividness participants were significantly more accurate at detecting salient changes to pictures compared to low vividness participants. The results suggest that vivid images are not characterised by a high level of detail and that vivid imagery enhances memory for the salient aspects of a scene but not all of the details of a scene. Possible causes of this difference, and how they may lead to an understanding of individual differences in change detection, are considered.


Author(s):  
Кonstantin А. Elshin ◽  
Еlena I. Molchanova ◽  
Мarina V. Usoltseva ◽  
Yelena V. Likhoshway

Using the TensorFlow Object Detection API, an approach to identifying and registering Baikal diatom species Synedra acus subsp. radians has been tested. As a result, a set of images was formed and training was conducted. It is shown that аfter 15000 training iterations, the total value of the loss function was obtained equal to 0,04. At the same time, the classification accuracy is equal to 95%, and the accuracy of construction of the bounding box is also equal to 95%.


2021 ◽  
Vol 13 (9) ◽  
pp. 1779
Author(s):  
Xiaoyan Yin ◽  
Zhiqun Hu ◽  
Jiafeng Zheng ◽  
Boyong Li ◽  
Yuanyuan Zuo

Radar beam blockage is an important error source that affects the quality of weather radar data. An echo-filling network (EFnet) is proposed based on a deep learning algorithm to correct the echo intensity under the occlusion area in the Nanjing S-band new-generation weather radar (CINRAD/SA). The training dataset is constructed by the labels, which are the echo intensity at the 0.5° elevation in the unblocked area, and by the input features, which are the intensity in the cube including multiple elevations and gates corresponding to the location of bottom labels. Two loss functions are applied to compile the network: one is the common mean square error (MSE), and the other is a self-defined loss function that increases the weight of strong echoes. Considering that the radar beam broadens with distance and height, the 0.5° elevation scan is divided into six range bands every 25 km to train different models. The models are evaluated by three indicators: explained variance (EVar), mean absolute error (MAE), and correlation coefficient (CC). Two cases are demonstrated to compare the effect of the echo-filling model by different loss functions. The results suggest that EFnet can effectively correct the echo reflectivity and improve the data quality in the occlusion area, and there are better results for strong echoes when the self-defined loss function is used.


Sign in / Sign up

Export Citation Format

Share Document