Justifying the Importance of Color Cues in Object Detection: A Case Study on Pedestrian

2012 ◽  
pp. 387-397 ◽  
Author(s):  
Qingyuan Wang ◽  
Junbiao Pang ◽  
Lei Qin ◽  
Shuqiang Jiang ◽  
Qingming Huang
Keyword(s):  
2021 ◽  
Vol 7 (4) ◽  
pp. 64
Author(s):  
Tanguy Ophoff ◽  
Cédric Gullentops ◽  
Kristof Van Beeck ◽  
Toon Goedemé

Object detection models are usually trained and evaluated on highly complicated, challenging academic datasets, which results in deep networks requiring lots of computations. However, a lot of operational use-cases consist of more constrained situations: they have a limited number of classes to be detected, less intra-class variance, less lighting and background variance, constrained or even fixed camera viewpoints, etc. In these cases, we hypothesize that smaller networks could be used without deteriorating the accuracy. However, there are multiple reasons why this does not happen in practice. Firstly, overparameterized networks tend to learn better, and secondly, transfer learning is usually used to reduce the necessary amount of training data. In this paper, we investigate how much we can reduce the computational complexity of a standard object detection network in such constrained object detection problems. As a case study, we focus on a well-known single-shot object detector, YoloV2, and combine three different techniques to reduce the computational complexity of the model without reducing its accuracy on our target dataset. To investigate the influence of the problem complexity, we compare two datasets: a prototypical academic (Pascal VOC) and a real-life operational (LWIR person detection) dataset. The three optimization steps we exploited are: swapping all the convolutions for depth-wise separable convolutions, perform pruning and use weight quantization. The results of our case study indeed substantiate our hypothesis that the more constrained a problem is, the more the network can be optimized. On the constrained operational dataset, combining these optimization techniques allowed us to reduce the computational complexity with a factor of 349, as compared to only a factor 9.8 on the academic dataset. When running a benchmark on an Nvidia Jetson AGX Xavier, our fastest model runs more than 15 times faster than the original YoloV2 model, whilst increasing the accuracy by 5% Average Precision (AP).


2020 ◽  
Vol 13 (1) ◽  
pp. 23
Author(s):  
Wei Zhao ◽  
William Yamada ◽  
Tianxin Li ◽  
Matthew Digman ◽  
Troy Runge

In recent years, precision agriculture has been researched to increase crop production with less inputs, as a promising means to meet the growing demand of agriculture products. Computer vision-based crop detection with unmanned aerial vehicle (UAV)-acquired images is a critical tool for precision agriculture. However, object detection using deep learning algorithms rely on a significant amount of manually prelabeled training datasets as ground truths. Field object detection, such as bales, is especially difficult because of (1) long-period image acquisitions under different illumination conditions and seasons; (2) limited existing prelabeled data; and (3) few pretrained models and research as references. This work increases the bale detection accuracy based on limited data collection and labeling, by building an innovative algorithms pipeline. First, an object detection model is trained using 243 images captured with good illimitation conditions in fall from the crop lands. In addition, domain adaptation (DA), a kind of transfer learning, is applied for synthesizing the training data under diverse environmental conditions with automatic labels. Finally, the object detection model is optimized with the synthesized datasets. The case study shows the proposed method improves the bale detecting performance, including the recall, mean average precision (mAP), and F measure (F1 score), from averages of 0.59, 0.7, and 0.7 (the object detection) to averages of 0.93, 0.94, and 0.89 (the object detection + DA), respectively. This approach could be easily scaled to many other crop field objects and will significantly contribute to precision agriculture.


Author(s):  
Muhammad Lanang Afkaar Ar ◽  
Sulthan Muzakki Adytia S ◽  
Yudhistira Nugraha ◽  
Farizah Rizka R ◽  
Andy Ernesto ◽  
...  

2018 ◽  
Vol 2 (2) ◽  
pp. 55-67 ◽  
Author(s):  
C. H. Wu ◽  
G. T. S. Ho ◽  
K. L. Yung ◽  
W. W. Y. Tam ◽  
W. H. Ip

Author(s):  
Gopal Sakarkar ◽  
Rashmi Baitule

Automated or robot-assisted collection is an evolving research domain that mixes aspects of machine vision and machine intelligence. When combined with robotics, image processing has proven to be an efficient method for analysis in various performance areas, namely agricultural applications. Most of it had been applied to the robot, which may want to pick fruit and type various fruits and vegetables. Identification and classification could even be a serious obstacle to computer vision demanding near-human levels of recognition. The target of this survey is to classify and briefly review the literature on harvesting robots that use different techniques and computer analysis of images of fruits and vegetables in agricultural activities, which incorporates 25 articles published within the last three decades. The proposed approach takes under consideration various sorts of fruit. Much research on this subject has been conducted in recent years, either implementing simple techniques such as computer vision like color-based clustering or using other sensors like LWIR, hyperspectral, or 3D. Current advances in computer vision offer an honest sort of advanced object detection techniques that would dramatically increase the quality of efficiency of fruit detection from RGB images. Some performance evaluation metrics obtained in various experiments are emphasized for the revised techniques, thus helping researchers to settle on and make new computer vision applications in fruit images.


2020 ◽  
Vol 6 (9) ◽  
pp. 97 ◽  
Author(s):  
Md Abul Ehsan Bhuiyan ◽  
Chandi Witharana ◽  
Anna K. Liljedahl ◽  
Benjamin M. Jones ◽  
Ronald Daanen ◽  
...  

Deep learning (DL) convolutional neural networks (CNNs) have been rapidly adapted in very high spatial resolution (VHSR) satellite image analysis. DLCNN-based computer visions (CV) applications primarily aim for everyday object detection from standard red, green, blue (RGB) imagery, while earth science remote sensing applications focus on geo object detection and classification from multispectral (MS) imagery. MS imagery includes RGB and narrow spectral channels from near- and/or middle-infrared regions of reflectance spectra. The central objective of this exploratory study is to understand to what degree MS band statistics govern DLCNN model predictions. We scaffold our analysis on a case study that uses Arctic tundra permafrost landform features called ice-wedge polygons (IWPs) as candidate geo objects. We choose Mask RCNN as the DLCNN architecture to detect IWPs from eight-band Worldview-02 VHSR satellite imagery. A systematic experiment was designed to understand the impact on choosing the optimal three-band combination in model prediction. We tasked five cohorts of three-band combinations coupled with statistical measures to gauge the spectral variability of input MS bands. The candidate scenes produced high model detection accuracies for the F1 score, ranging between 0.89 to 0.95, for two different band combinations (coastal blue, blue, green (1,2,3) and green, yellow, red (3,4,5)). The mapping workflow discerned the IWPs by exhibiting low random and systematic error in the order of 0.17–0.19 and 0.20–0.21, respectively, for band combinations (1,2,3). Results suggest that the prediction accuracy of the Mask-RCNN model is significantly influenced by the input MS bands. Overall, our findings accentuate the importance of considering the image statistics of input MS bands and careful selection of optimal bands for DLCNN predictions when DLCNN architectures are restricted to three spectral channels.


Sign in / Sign up

Export Citation Format

Share Document