Investigating the Potential of Network Optimization for a Constrained Object Detection Problem

Tanguy Ophoff; Cédric Gullentops; Kristof Van Beeck; Toon Goedemé

doi:10.3390/jimaging7040064

Investigating the Potential of Network Optimization for a Constrained Object Detection Problem

Journal of Imaging ◽

10.3390/jimaging7040064 ◽

2021 ◽

Vol 7 (4) ◽

pp. 64

Author(s):

Tanguy Ophoff ◽

Cédric Gullentops ◽

Kristof Van Beeck ◽

Toon Goedemé

Keyword(s):

Computational Complexity ◽

Object Detection ◽

Network Optimization ◽

Real Life ◽

Optimization Techniques ◽

Training Data ◽

Single Shot ◽

Standard Object ◽

Number Of Classes

Object detection models are usually trained and evaluated on highly complicated, challenging academic datasets, which results in deep networks requiring lots of computations. However, a lot of operational use-cases consist of more constrained situations: they have a limited number of classes to be detected, less intra-class variance, less lighting and background variance, constrained or even fixed camera viewpoints, etc. In these cases, we hypothesize that smaller networks could be used without deteriorating the accuracy. However, there are multiple reasons why this does not happen in practice. Firstly, overparameterized networks tend to learn better, and secondly, transfer learning is usually used to reduce the necessary amount of training data. In this paper, we investigate how much we can reduce the computational complexity of a standard object detection network in such constrained object detection problems. As a case study, we focus on a well-known single-shot object detector, YoloV2, and combine three different techniques to reduce the computational complexity of the model without reducing its accuracy on our target dataset. To investigate the influence of the problem complexity, we compare two datasets: a prototypical academic (Pascal VOC) and a real-life operational (LWIR person detection) dataset. The three optimization steps we exploited are: swapping all the convolutions for depth-wise separable convolutions, perform pruning and use weight quantization. The results of our case study indeed substantiate our hypothesis that the more constrained a problem is, the more the network can be optimized. On the constrained operational dataset, combining these optimization techniques allowed us to reduce the computational complexity with a factor of 349, as compared to only a factor 9.8 on the academic dataset. When running a benchmark on an Nvidia Jetson AGX Xavier, our fastest model runs more than 15 times faster than the original YoloV2 model, whilst increasing the accuracy by 5% Average Precision (AP).

Download Full-text

Augmenting Crop Detection for Precision Agriculture with Deep Visual Transfer Learning—A Case Study of Bale Detection

Remote Sensing ◽

10.3390/rs13010023 ◽

2020 ◽

Vol 13 (1) ◽

pp. 23

Author(s):

Wei Zhao ◽

William Yamada ◽

Tianxin Li ◽

Matthew Digman ◽

Troy Runge

Keyword(s):

Object Detection ◽

Transfer Learning ◽

Precision Agriculture ◽

Crop Production ◽

Domain Adaptation ◽

Training Data ◽

Detection Accuracy ◽

Detection Model ◽

Agriculture Products

In recent years, precision agriculture has been researched to increase crop production with less inputs, as a promising means to meet the growing demand of agriculture products. Computer vision-based crop detection with unmanned aerial vehicle (UAV)-acquired images is a critical tool for precision agriculture. However, object detection using deep learning algorithms rely on a significant amount of manually prelabeled training datasets as ground truths. Field object detection, such as bales, is especially difficult because of (1) long-period image acquisitions under different illumination conditions and seasons; (2) limited existing prelabeled data; and (3) few pretrained models and research as references. This work increases the bale detection accuracy based on limited data collection and labeling, by building an innovative algorithms pipeline. First, an object detection model is trained using 243 images captured with good illimitation conditions in fall from the crop lands. In addition, domain adaptation (DA), a kind of transfer learning, is applied for synthesizing the training data under diverse environmental conditions with automatic labels. Finally, the object detection model is optimized with the synthesized datasets. The case study shows the proposed method improves the bale detecting performance, including the recall, mean average precision (mAP), and F measure (F1 score), from averages of 0.59, 0.7, and 0.7 (the object detection) to averages of 0.93, 0.94, and 0.89 (the object detection + DA), respectively. This approach could be easily scaled to many other crop field objects and will significantly contribute to precision agriculture.

Download Full-text

A Multiobjective Approach to the Reorganization of Health-Service Areas: A Case Study

Environment and Planning A Economy and Space ◽

10.1068/a201461 ◽

1988 ◽

Vol 20 (11) ◽

pp. 1461-1470 ◽

Cited By ~ 8

Author(s):

J Malczewski ◽

W Ogryczak

Keyword(s):

Health Care ◽

Health Service ◽

Real Life ◽

Optimization Techniques ◽

Compromise Solution ◽

Objective Functions ◽

Service Areas ◽

Multiobjective Approach ◽

Multiobjective Analysis

In this paper the authors present an application of optimization techniques to the real-life problem of the reorganization of health-service areas. The problem is formulated as a linear programming problem with three objective functions. The values of the three objective functions proved to vary significantly depending on the assumed hierarchy of the objectives. Nevertheless, the multiobjective analysis based on parametric techniques was found to provide a compromise solution which implied significant improvement in the performance of the health-care system.

Download Full-text

Comparative Research on Deep Learning Approaches for Airplane Detection from Very High-Resolution Satellite Images

Remote Sensing ◽

10.3390/rs12030458 ◽

2020 ◽

Vol 12 (3) ◽

pp. 458 ◽

Cited By ~ 7

Author(s):

Ugur Alganci ◽

Mehmet Soydas ◽

Elif Sertel

Keyword(s):

Deep Learning ◽

High Resolution ◽

Object Detection ◽

Satellite Images ◽

Training Data ◽

Object Localization ◽

Detection Accuracy ◽

Single Shot ◽

High Resolution Satellite Images ◽

Very High

Object detection from satellite images has been a challenging problem for many years. With the development of effective deep learning algorithms and advancement in hardware systems, higher accuracies have been achieved in the detection of various objects from very high-resolution (VHR) satellite images. This article provides a comparative evaluation of the state-of-the-art convolutional neural network (CNN)-based object detection models, which are Faster R-CNN, Single Shot Multi-box Detector (SSD), and You Look Only Once-v3 (YOLO-v3), to cope with the limited number of labeled data and to automatically detect airplanes in VHR satellite images. Data augmentation with rotation, rescaling, and cropping was applied on the test images to artificially increase the number of training data from satellite images. Moreover, a non-maximum suppression algorithm (NMS) was introduced at the end of the SSD and YOLO-v3 flows to get rid of the multiple detection occurrences near each detected object in the overlapping areas. The trained networks were applied to five independent VHR test images that cover airports and their surroundings to evaluate their performance objectively. Accuracy assessment results of the test regions proved that Faster R-CNN architecture provided the highest accuracy according to the F1 scores, average precision (AP) metrics, and visual inspection of the results. The YOLO-v3 ranked as second, with a slightly lower performance but providing a balanced trade-off between accuracy and speed. The SSD provided the lowest detection performance, but it was better in object localization. The results were also evaluated in terms of the object size and detection accuracy manner, which proved that large- and medium-sized airplanes were detected with higher accuracy.

Download Full-text

Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) ◽

10.1109/mmsp48831.2020.9287053 ◽

2020 ◽

Author(s):

Kevin Riou ◽

Jingwen Zhu ◽

Suiyi Ling ◽

Mathis Piquet ◽

Vincent Truffault ◽

...

Keyword(s):

Object Detection ◽

Real Life

Download Full-text

Efficient Object Detection Framework and Hardware Architecture for Remote Sensing Images

Remote Sensing ◽

10.3390/rs11202376 ◽

2019 ◽

Vol 11 (20) ◽

pp. 2376 ◽

Cited By ~ 4

Author(s):

Li ◽

Zhang ◽

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Computational Complexity ◽

Object Detection ◽

Graphics Processing Units ◽

Feature Fusion ◽

Hardware Architecture ◽

Single Shot ◽

Remote Sensing Images ◽

Feature Maps

Object detection in remote sensing images on a satellite or aircraft has important economic and military significance and is full of challenges. This task requires not only accurate and efficient algorithms, but also highperformance and low power hardware architecture. However, existing deep learning based object detection algorithms require further optimization in small objects detection, reduced computational complexity and parameter size. Meanwhile, the generalpurpose processor cannot achieve better power efficiency, and the previous design of deep learning processor has still potential for mining parallelism. To address these issues, we propose an efficient contextbased feature fusion single shot multibox detector (CBFFSSD) framework, using lightweight MobileNet as the backbone network to reduce parameters and computational complexity, adding feature fusion units and detecting feature maps to enhance the recognition of small objects and improve detection accuracy. Based on the analysis and optimization of the calculation of each layer in the algorithm, we propose efficient hardware architecture of deep learning processor with multiple neural processing units (NPUs) composed of 2D processing elements (PEs), which can simultaneously calculate multiple output feature maps. The parallel architecture, hierarchical onchip storage organization, and the local register are used to achieve parallel processing, sharing and reuse of data, and make the calculation of processor more efficient. Extensive experiments and comprehensive evaluations on the public NWPU VHR10 dataset and comparisons with some stateoftheart approaches demonstrate the effectiveness and superiority of the proposed framework. Moreover, for evaluating the performance of proposed hardware architecture, we implement it on Xilinx XC7Z100 field programmable gate array (FPGA) and test on the proposed CBFFSSD and VGG16 models. Experimental results show that our processor are more power efficient than general purpose central processing units (CPUs) and graphics processing units (GPUs), and have better performance density than other stateoftheart FPGAbased designs.

Download Full-text

Contamination Lab of Turin (CLabTo)

Conference Proceedings of the Academy for Design Innovation Management ◽

10.33114/adim.2019.02.359 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Eleonora FIORE ◽

Giuliano SANSONE ◽

Chiara Lorenza REMONDINO ◽

Paolo Marco TAMBORRINI

Keyword(s):

University Students ◽

Single Case ◽

Real Life ◽

Learning By Doing ◽

Assessment Data ◽

Entrepreneurial Skills ◽

Fields Of Study ◽

Education Levels ◽

Positive Effect

Interest in offering Entrepreneurship Education (EE) to all kinds of university students is increasing. Therefore, universities are increasing the number of entrepreneurship courses intended for students from different fields of study and with different education levels. Through a single case study of the Contamination Lab of Turin (CLabTo), we suggest how EE may be taught to all kinds of university students. We have combined design methods with EE to create a practical-oriented entrepreneurship course which allows students to work in transdisciplinary teams through a learning-by-doing approach on real-life projects. Professors from different departments have been included to create a multidisciplinary environment. We have drawn on programme assessment data, including pre- and post-surveys. Overall, we have found a positive effect of the programme on the students’ entrepreneurial skills. However, when the data was broken down according to the students’ fields of study and education levels, mixed results emerged.

Download Full-text

Participation and spatial transformation in large housing estates in Poland. Case study of Ursynów Północny

Urban Development Issues ◽

10.2478/udi-2018-0034 ◽

2018 ◽

Vol 60 (1) ◽

pp. 55-65

Author(s):

Krystyna Ilmurzyńska

Keyword(s):

Basic Assumption ◽

Real Life ◽

Business Environment ◽

Spatial Transformation ◽

Reciprocal Relationships ◽

Spatial Transformations ◽

Large Housing Estates ◽

Existing Housing ◽

Housing Estates

Abstract This article investigates the suitability of traditional and participatory planning approaches in managing the process of spatial development of existing housing estates, based on the case study of Warsaw’s Ursynów Północny district. The basic assumption of the article is that due to lack of government schemes targeted at the restructuring of large housing estates, it is the business environment that drives spatial transformations and through that shapes the development of participation. Consequently the article focuses on the reciprocal relationships between spatial transformations and participatory practices. Analysis of Ursynów Północny against the background of other estates indicates that it presents more endangered qualities than issues to be tackled. Therefore the article focuses on the potential of the housing estate and good practices which can be tracked throughout its lifetime. The paper focuses furthermore on real-life processes, addressing the issue of privatisation, development pressure, formal planning procedures and participatory budgeting. In the conclusion it attempts to interpret the existing spatial structure of the estate as a potential framework for a participatory approach.

Download Full-text

Bleak Weather for Sun-Shine AG: A Case Study of Impairment of Assets

Issues in Accounting Education ◽

10.2308/iace-51007 ◽

2014 ◽

Vol 30 (2) ◽

pp. 113-126 ◽

Cited By ~ 4

Author(s):

Dominic Detzen ◽

Tobias Stork genannt Wersborg ◽

Henning Zülch

Keyword(s):

Real Life ◽

Accounting Standards ◽

Point Of View ◽

Professional Judgment ◽

Economic Point ◽

Instructional Resource

ABSTRACT This case originates from a real-life business situation and illustrates the application of impairment tests in accordance with IFRS and U.S. GAAP. In the first part of the case study, students examine conceptual questions of impairment tests under IFRS and U.S. GAAP with respect to applicable accounting standards, definitions, value concepts, and frequency of application. In addition, the case encourages students to discuss the impairment regime from an economic point of view. The second part of the instructional resource continues to provide instructors with the flexibility of applying U.S. GAAP and/or IFRS when students are asked to test a long-lived asset for impairment and, if necessary, allocate any potential impairment. This latter part demonstrates that impairment tests require professional judgment that students are to exercise in the case.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text

Design of Desktop Audiovisual Entertainment System with Deep Learning and Haptic Sensations

Symmetry ◽

10.3390/sym12101718 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1718

Author(s):

Chien-Hsing Chou ◽

Yu-Sheng Su ◽

Che-Ju Hsu ◽

Kong-Chang Lee ◽

Ping-Hsuan Han

Keyword(s):

Deep Learning ◽

Object Detection ◽

User Experience ◽

Recognition System ◽

Scene Recognition ◽

Single Shot ◽

Auditory Signals ◽

Hot Weather ◽

Viewing Experience ◽

At Home

In this study, we designed a four-dimensional (4D) audiovisual entertainment system called Sense. This system comprises a scene recognition system and hardware modules that provide haptic sensations for users when they watch movies and animations at home. In the scene recognition system, we used Google Cloud Vision to detect common scene elements in a video, such as fire, explosions, wind, and rain, and further determine whether the scene depicts hot weather, rain, or snow. Additionally, for animated videos, we applied deep learning with a single shot multibox detector to detect whether the animated video contained scenes of fire-related objects. The hardware module was designed to provide six types of haptic sensations set as line-symmetry to provide a better user experience. After the system considers the results of object detection via the scene recognition system, the system generates corresponding haptic sensations. The system integrates deep learning, auditory signals, and haptic sensations to provide an enhanced viewing experience.

Download Full-text