Universal Optimization Strategies for Object Detection Networks

Author(s):  
Ziyu Shi ◽  
Haichang Gao ◽  
Yiwen Tang ◽  
Han Zheng ◽  
Shuai Kang ◽  
...  

With the development of deep learning technologies, object detection algorithms have made significant progress in terms of detection speed and detection performance. However, the detection speed of current detection networks still does not meet the requirements of real-world applications in some scenarios. In this paper, we propose a faster non-maximum suppression (FNMS) algorithm that reduces the processing time by a large margin while achieving the same detection precision compared with the traditional non-maximum suppression (NMS) algorithm. Moreover, an attempt is made to adopt additional lightweight network structures to improve the speed of the detection network. By combining our FNMS algorithm with other network optimization strategies, we are able to improve the detection speed of YOLO v3 on the DOTA dataset by 165%.

2019 ◽  
Vol 3 (5) ◽  
Author(s):  
Qirui Dong

The main purpose of YOLOv3, aiming to improve the detection speed and accuracy from current detection models, is to predict the center coordinates of (x, y) from the Bounding Box and its length, width through multiple layers of VGG Convolutional Neural Network (VGG-CNN) and uses the Darknet lightweight framework to process images at a faster speed. More specifically, our model has been reduced part of YOLOv3’s complex and computationally intensive procedures and improved its algorithms to maintain the efficiency and accuracy of object detection. By this method, it performs a higher quality on mass object detection tasks with fewer detection errors.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2611
Author(s):  
Andrew Shepley ◽  
Greg Falzon ◽  
Christopher Lawson ◽  
Paul Meek ◽  
Paul Kwan

Image data is one of the primary sources of ecological data used in biodiversity conservation and management worldwide. However, classifying and interpreting large numbers of images is time and resource expensive, particularly in the context of camera trapping. Deep learning models have been used to achieve this task but are often not suited to specific applications due to their inability to generalise to new environments and inconsistent performance. Models need to be developed for specific species cohorts and environments, but the technical skills required to achieve this are a key barrier to the accessibility of this technology to ecologists. Thus, there is a strong need to democratize access to deep learning technologies by providing an easy-to-use software application allowing non-technical users to train custom object detectors. U-Infuse addresses this issue by providing ecologists with the ability to train customised models using publicly available images and/or their own images without specific technical expertise. Auto-annotation and annotation editing functionalities minimize the constraints of manually annotating and pre-processing large numbers of images. U-Infuse is a free and open-source software solution that supports both multiclass and single class training and object detection, allowing ecologists to access deep learning technologies usually only available to computer scientists, on their own device, customised for their application, without sharing intellectual property or sensitive data. It provides ecological practitioners with the ability to (i) easily achieve object detection within a user-friendly GUI, generating a species distribution report, and other useful statistics, (ii) custom train deep learning models using publicly available and custom training data, (iii) achieve supervised auto-annotation of images for further training, with the benefit of editing annotations to ensure quality datasets. Broad adoption of U-Infuse by ecological practitioners will improve ecological image analysis and processing by allowing significantly more image data to be processed with minimal expenditure of time and resources, particularly for camera trap images. Ease of training and use of transfer learning means domain-specific models can be trained rapidly, and frequently updated without the need for computer science expertise, or data sharing, protecting intellectual property and privacy.


Author(s):  
Samuel Humphries ◽  
Trevor Parker ◽  
Bryan Jonas ◽  
Bryan Adams ◽  
Nicholas J Clark

Quick identification of building and roads is critical for execution of tactical US military operations in an urban environment. To this end, a gridded, referenced, satellite images of an objective, often referred to as a gridded reference graphic or GRG, has become a standard product developed during intelligence preparation of the environment. At present, operational units identify key infrastructure by hand through the work of individual intelligence officers. Recent advances in Convolutional Neural Networks, however, allows for this process to be streamlined through the use of object detection algorithms. In this paper, we describe an object detection algorithm designed to quickly identify and label both buildings and road intersections present in an image. Our work leverages both the U-Net architecture as well the SpaceNet data corpus to produce an algorithm that accurately identifies a large breadth of buildings and different types of roads. In addition to predicting buildings and roads, our model numerically labels each building by means of a contour finding algorithm. Most importantly, the dual U-Net model is capable of predicting buildings and roads on a diverse set of test images and using these predictions to produce clean GRGs.


2021 ◽  
Vol 7 (4) ◽  
pp. 64
Author(s):  
Tanguy Ophoff ◽  
Cédric Gullentops ◽  
Kristof Van Beeck ◽  
Toon Goedemé

Object detection models are usually trained and evaluated on highly complicated, challenging academic datasets, which results in deep networks requiring lots of computations. However, a lot of operational use-cases consist of more constrained situations: they have a limited number of classes to be detected, less intra-class variance, less lighting and background variance, constrained or even fixed camera viewpoints, etc. In these cases, we hypothesize that smaller networks could be used without deteriorating the accuracy. However, there are multiple reasons why this does not happen in practice. Firstly, overparameterized networks tend to learn better, and secondly, transfer learning is usually used to reduce the necessary amount of training data. In this paper, we investigate how much we can reduce the computational complexity of a standard object detection network in such constrained object detection problems. As a case study, we focus on a well-known single-shot object detector, YoloV2, and combine three different techniques to reduce the computational complexity of the model without reducing its accuracy on our target dataset. To investigate the influence of the problem complexity, we compare two datasets: a prototypical academic (Pascal VOC) and a real-life operational (LWIR person detection) dataset. The three optimization steps we exploited are: swapping all the convolutions for depth-wise separable convolutions, perform pruning and use weight quantization. The results of our case study indeed substantiate our hypothesis that the more constrained a problem is, the more the network can be optimized. On the constrained operational dataset, combining these optimization techniques allowed us to reduce the computational complexity with a factor of 349, as compared to only a factor 9.8 on the academic dataset. When running a benchmark on an Nvidia Jetson AGX Xavier, our fastest model runs more than 15 times faster than the original YoloV2 model, whilst increasing the accuracy by 5% Average Precision (AP).


2021 ◽  
Vol 104 (2) ◽  
pp. 003685042110113
Author(s):  
Xianghua Ma ◽  
Zhenkun Yang

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.


Author(s):  
Riichi Kudo ◽  
Kahoko Takahashi ◽  
Takeru Inoue ◽  
Kohei Mizuno

Abstract Various smart connected devices are emerging like automated driving cars, autonomous robots, and remote-controlled construction vehicles. These devices have vision systems to conduct their operations without collision. Machine vision technology is becoming more accessible to perceive self-position and/or the surrounding environment thanks to the great advances in deep learning technologies. The accurate perception information of these smart connected devices makes it possible to predict wireless link quality (LQ). This paper proposes an LQ prediction scheme that applies machine learning to HD camera output to forecast the influence of surrounding mobile objects on LQ. The proposed scheme utilizes object detection based on deep learning and learns the relationship between the detected object position information and the LQ. Outdoor experiments show that LQ prediction proposal can well predict the throughput for around 1 s into the future in a 5.6-GHz wireless LAN channel.


2021 ◽  
Author(s):  
Alexis Koulidis ◽  
Mohamed Abdullatif ◽  
Ahmed Galal Abdel-Kader ◽  
Mohammed-ilies Ayachi ◽  
Shehab Ahmed ◽  
...  

Abstract Surface data measurement and analysis are an established mean of detecting drillstring low-frequency torsional vibration or stick-slip. The industry has also developed models that link surface torque and downhole drill bit rotational speed. Cameras provide an alternative noninvasive approach to existing wired/wireless sensors used to gather such surface data. The results of a preliminary field assessment of drilling dynamics utilizing camera-based drillstring monitoring are presented in this work. Detection and timing of events from the video are performed using computer vision techniques and object detection algorithms. A real-time interest point tracker utilizing homography estimation and sparse optical flow point tracking is deployed. We use a fully convolutional deep neural network trained to detect interest points and compute their accompanying descriptors. The detected points and descriptors are matched across video sequences and used for drillstring rotation detection and speed estimation. When the drillstring's vibration is invisible to the naked eye, the point tracking algorithm is preceded with a motion amplification function based on another deep convolutional neural network. We have clearly demonstrated the potential of camera-based noninvasive approaches to surface drillstring dynamics data acquisition and analysis. Through the application of real-time object detection algorithms on rig video feed, surface events were detected and timed. We were also able to estimate drillstring rotary speed and motion profile. Torsional drillstring modes can be identified and correlated with drilling parameters and bottomhole assembly design. A novel vibration array sensing approach based on a multi-point tracking algorithm is also proposed. A vibration threshold setting was utilized to enable an additional motion amplification function providing seamless assessment for multi-scale vibration measurement. Cameras were typically devices to acquire images/videos for offline automated assessment (recently) or online manual monitoring (mainly), this work has shown how fog/edge computing makes it possible for these cameras to be "conscious" and "intelligent," hence play a critical role in automation/digitalization of drilling rigs. We showcase their preliminary application as drilling dynamics and rig operations sensors in this work. Cameras are an ideal sensor for a drilling environment since they can be installed anywhere on a rig to perform large-scale live video analytics on drilling processes.


Sign in / Sign up

Export Citation Format

Share Document