scholarly journals Towards real-time object detection on edge with deep neural networks

2018 ◽  
Author(s):  
◽  
Zhi Zhang

Despite being a core topic for more than several decades, object detection is still receiving increasing attentions due to its irreplaceable importance in a wide variety of applications. Abundant object detectors based on deep neural networks have shown significantly revamped accuracies in recent years. However, it's still the day one for these models to be effectively deployed to real world. In this dissertation, we focus on object detection models which tackle real world problems that are unavailable few years ago. We also aim at making object detectors on the go, which means detectors are not longer required to be run on workstations and cloud services which is latency unfriendly. To achieve these goals, we addressed the problem in two phases: application and deployment. We have done thoughtful research on both areas. Our contribution involves inter-frame information fusing, model knowledge distillation, advanced model flow control for progressive inference, and hardware oriented model design and optimization. More specifically, we proposed a novel cross-frame verification scheme for spatial temporal fused object detection model for sequential images and videos in a proposal and reject favor. To compress model from a learning basis and resolve domain specific training data shortage, we improved the learning algorithm to handle insufficient labeled data by searching for optimal guidance paths from pre-trained models. To further reduce model inference cost, we designed a progressive neural network which run in flexible cost enabled by RNN style decision controller during runtime. We recognize the awkward model deployment problem, especially for object detection models that require excessive customized layers. In response, we propose to use end-to-end neural network which use pure neural network components to substitute traditional post-processing operations. We also applied operator decomposition and graph level and on-device optimization towards real-time object detection on low power edge devices. All these works have achieved state-of-the-art performances and converted to successful applications.

2020 ◽  
Vol 10 (6) ◽  
pp. 2104
Author(s):  
Michał Tomaszewski ◽  
Paweł Michalski ◽  
Jakub Osuchowski

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.


Author(s):  
Akash Kumar, Dr. Amita Goel Prof. Vasudha Bahl and Prof. Nidhi Sengar

Object Detection is a study in the field of computer vision. An object detection model recognizes objects of the real world present either in a captured image or in real-time video where the object can belong to any class of objects namely humans, animals, objects, etc. This project is an implementation of an algorithm based on object detection called You Only Look Once (YOLO v3). The architecture of yolo model is extremely fast compared to all previous methods. Yolov3 model executes a single neural network to the given image and then divides the image into predetermined bounding boxes. These boxes are weighted by the predicted probabilities. After non max-suppression it gives the result of recognized objects together with bounding boxes. Yolo trains and directly executes object detection on full images.


2020 ◽  
Vol 226 ◽  
pp. 02020
Author(s):  
Alexey V. Stadnik ◽  
Pavel S. Sazhin ◽  
Slavomir Hnatic

The performance of neural networks is one of the most important topics in the field of computer vision. In this work, we analyze the speed of object detection using the well-known YOLOv3 neural network architecture in different frameworks under different hardware requirements. We obtain results, which allow us to formulate preliminary qualitative conclusions about the feasibility of various hardware scenarios to solve tasks in real-time environments.


Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5323
Author(s):  
Yongsu Kim ◽  
Hyoeun Kang ◽  
Naufal Suryanto ◽  
Harashta Tatimma Tatimma Larasati ◽  
Afifatul Mukaroh ◽  
...  

Deep neural networks (DNNs), especially those used in computer vision, are highly vulnerable to adversarial attacks, such as adversarial perturbations and adversarial patches. Adversarial patches, often considered more appropriate for a real-world attack, are attached to the target object or its surroundings to deceive the target system. However, most previous research employed adversarial patches that are conspicuous to human vision, making them easy to identify and counter. Previously, the spatially localized perturbation GAN (SLP-GAN) was proposed, in which the perturbation was only added to the most representative area of the input images, creating a spatially localized adversarial camouflage patch that excels in terms of visual fidelity and is, therefore, difficult to detect by human vision. In this study, the use of the method called eSLP-GAN was extended to deceive classifiers and object detection systems. Specifically, the loss function was modified for greater compatibility with an object-detection model attack and to increase robustness in the real world. Furthermore, the applicability of the proposed method was tested on the CARLA simulator for a more authentic real-world attack scenario.


Author(s):  
Ulas Isildak ◽  
Alessandro Stella ◽  
Matteo Fumagalli

1AbstractBalancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-intime simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to Familial Mediterranean Fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterise signals of selection on intermediate-frequency variants, an analysis currently inaccessible by commonly used strategies.


2021 ◽  
Author(s):  
Efstratios Kontellis ◽  
Christos Troussas ◽  
Akrivi Krouska ◽  
Cleo Sgouropoulou

The COVID-19 pandemic provoked many changes in our everyday life. For instance, wearing protective face masks has become a new norm and is an essential measure, having been imposed by countries worldwide. As such, during these times, people must wear masks to enter buildings. In view of this compelling need, the objective of this paper is to create a real-time face mask detector that uses image recognition technology to identify: (i) if it can detect a human face in a video stream and (ii) if the human face, which was detected, was wearing an object that it looked like a face mask and if it was properly worn. Our face mask detection model is using OpenCV Deep Neural Network (DNN), TensorFlow and MobileNetV2 architecture as an image classifier and after training, achieved 99.64% of accuracy.


Author(s):  
S. Spiegel ◽  
J. Chen

Abstract. Deep neural networks (DNNs) and convolutional neural networks (CNNs) have demonstrated greater robustness and accuracy in classifying two-dimensional images and three-dimensional point clouds compared to more traditional machine learning approaches. However, their main drawback is the need for large quantities of semantically labeled training data sets, which are often out of reach for those with resource constraints. In this study, we evaluated the use of simulated 3D point clouds for training a CNN learning algorithm to segment and classify 3D point clouds of real-world urban environments. The simulation involved collecting light detection and ranging (LiDAR) data using a simulated 16 channel laser scanner within the the CARLA (Car Learning to Act) autonomous vehicle gaming environment. We used this labeled data to train the Kernel Point Convolution (KPConv) and KPConv Segmentation Network for Point Clouds (KP-FCNN), which we tested on real-world LiDAR data from the NPM3D benchmark data set. Our results showed that high accuracy can be achieved using data collected in a simulator.


2021 ◽  
Vol 8 ◽  
Author(s):  
Namiko Saito ◽  
Tetsuya Ogata ◽  
Hiroki Mori ◽  
Shingo Murata ◽  
Shigeki Sugano

We propose a tool-use model that enables a robot to act toward a provided goal. It is important to consider features of the four factors; tools, objects actions, and effects at the same time because they are related to each other and one factor can influence the others. The tool-use model is constructed with deep neural networks (DNNs) using multimodal sensorimotor data; image, force, and joint angle information. To allow the robot to learn tool-use, we collect training data by controlling the robot to perform various object operations using several tools with multiple actions that leads different effects. Then the tool-use model is thereby trained and learns sensorimotor coordination and acquires relationships among tools, objects, actions and effects in its latent space. We can give the robot a task goal by providing an image showing the target placement and orientation of the object. Using the goal image with the tool-use model, the robot detects the features of tools and objects, and determines how to act to reproduce the target effects automatically. Then the robot generates actions adjusting to the real time situations even though the tools and objects are unknown and more complicated than trained ones.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 342
Author(s):  
Fabio Martinelli ◽  
Fiammetta Marulli ◽  
Francesco Mercaldo ◽  
Antonella Santone

The proliferation of info-entertainment systems in nowadays vehicles has provided a really cheap and easy-to-deploy platform with the ability to gather information about the vehicle under analysis. With the purpose to provide an architecture to increase safety and security in automotive context, in this paper we propose a fully connected neural network architecture considering position-based features aimed to detect in real-time: (i) the driver, (ii) the driving style and (iii) the path. The experimental analysis performed on real-world data shows that the proposed method obtains encouraging results.


2021 ◽  
Vol 3 (3) ◽  
pp. 662-671
Author(s):  
Jonas Herskind Sejr ◽  
Peter Schneider-Kamp ◽  
Naeem Ayoub

Due to impressive performance, deep neural networks for object detection in images have become a prevalent choice. Given the complexity of the neural network models used, users of these algorithms are typically given no hint as to how the objects were found. It remains, for example, unclear whether an object is detected based on what it looks like or based on the context in which it is located. We have developed an algorithm, Surrogate Object Detection Explainer (SODEx), that can explain any object detection algorithm using any classification explainer. We evaluate SODEx qualitatively and quantitatively by detecting objects in the COCO dataset with YOLOv4 and explaining these detections with LIME. This empirical evaluation does not only demonstrate the value of explainable object detection, it also provides valuable insights into how YOLOv4 detects objects.


Sign in / Sign up

Export Citation Format

Share Document