scholarly journals Deep Learning-Based Real-Time Multiple-Object Detection and Tracking from Aerial Imagery via a Flying Robot with GPU-Based Embedded Devices

Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3371 ◽  
Author(s):  
Hossain ◽  
Lee

In recent years, demand has been increasing for target detection and tracking from aerial imagery via drones using onboard powered sensors and devices. We propose a very effective method for this application based on a deep learning framework. A state-of-the-art embedded hardware system empowers small flying robots to carry out the real-time onboard computation necessary for object tracking. Two types of embedded modules were developed: one was designed using a Jetson TX or AGX Xavier, and the other was based on an Intel Neural Compute Stick. These are suitable for real-time onboard computing power on small flying drones with limited space. A comparative analysis of current state-of-the-art deep learning-based multi-object detection algorithms was carried out utilizing the designated GPU-based embedded computing modules to obtain detailed metric data about frame rates, as well as the computation power. We also introduce an effective target tracking approach for moving objects. The algorithm for tracking moving objects is based on the extension of simple online and real-time tracking. It was developed by integrating a deep learning-based association metric approach with simple online and real-time tracking (Deep SORT), which uses a hypothesis tracking methodology with Kalman filtering and a deep learning-based association metric. In addition, a guidance system that tracks the target position using a GPU-based algorithm is introduced. Finally, we demonstrate the effectiveness of the proposed algorithms by real-time experiments with a small multi-rotor drone.


Author(s):  
Sankar K. Pal ◽  
Anima Pramanik ◽  
J. Maiti ◽  
Pabitra Mitra


2020 ◽  
Author(s):  
◽  
Yang Liu

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] With the rapid development of deep learning in computer vision, especially deep convolutional neural networks (CNNs), significant advances have been made in recent years on object recognition and detection in images. Highly accurate detection results have been achieved for large objects, whereas detection accuracy on small objects remains to be low. This dissertation focuses on investigating deep learning methods for small object detection in images and proposing new methods with improved performance. First, we conducted a comprehensive review of existing deep learning methods for small object detections, in which we summarized and categorized major techniques and models, identified major challenges, and listed some future research directions. Existing techniques were categorized into using contextual information, combining multiple feature maps, creating sufficient positive examples, and balancing foreground and background examples. Methods developed in four related areas, generic object detection, face detection, object detection in aerial imagery, and segmentation, were summarized and compared. In addition, the performances of several leading deep learning methods for small object detection, including YOLOv3, Faster R-CNN, and SSD, were evaluated based on three large benchmark image datasets of small objects. Experimental results showed that Faster R-CNN performed the best, while YOLOv3 was a close second. Furthermore, a new deep learning method, called Retina-context Net, was proposed and outperformed state-of-the art one-stage deep learning models, including SSD, YOLOv3 and RetinaNet, on the COCO and SUN benchmark datasets. Secondly, we created a new dataset for bird detection, called Little Birds in Aerial Imagery (LBAI), from real-life aerial imagery. LBAI contains birds with sizes ranging from 10 by 10 pixels to 40 by 40 pixels. We adapted and applied several state-of-the-art deep learning models to LBAI, including object detection models such as YOLOv2, SSH, and Tiny Face, and instance segmentation models such as U-Net and Mask R-CNN. Our empirical results illustrated the strength and weakness of these methods, showing that SSH performed the best for easy cases, whereas Tiny Face performed the best for hard cases with cluttered backgrounds. Among small instance segmentation methods, U-Net achieved slightly better performance than Mask R-CNN. Thirdly, we proposed a new graph neural network-based object detection algorithm, called GODM, to take the spatial information of candidate objects into consideration in small object detection. Instead of detecting small objects independently as the existing deep learning methods do, GODM treats the candidate bounding boxes generated by existing object detectors as nodes and creates edges based on the spatial or semantic relationship between the candidate bounding boxes. GODM contains four major components: node feature generation, graph generation, node class labelling, and graph convolutional neural network model. Several graph generation methods were proposed. Experimental results on the LBDA dataset show that GODM outperformed existing state-of-the-art object detector Faster R-CNN significantly, up to 12% better in accuracy. Finally, we proposed a new computer vision-based grass analysis using machine learning. To deal with the variation of lighting condition, a two-stage segmentation strategy is proposed for grass coverage computation based on a blackboard background. On a real world dataset we collected from natural environments, the proposed method was robust to varying environments, lighting, and colors. For grass detection and coverage computation, the error rate was just 3%.



2020 ◽  
Vol 8 (6) ◽  
pp. 3162-3165

Detecting and classifying objects in a single frame which consists of several objects in a cumbersome task. With the advancement of deep learning techniques, the rate of accuracy has increased significantly. This paper aims to implement the state of the art custom algorithm for detection and classification of objects in a single frame with the goal of attaining high accuracy with a real time performance. The proposed system utilizes SSD architecture coupled with MobileNet to achieve maximum accuracy. The system will be fast enough to detect and recognize multiple objects even at 30 FPS.



2011 ◽  
Vol 403-408 ◽  
pp. 4968-4973
Author(s):  
Rajendra Kachhava ◽  
Vivek Srivastava ◽  
Rajkumar Jain ◽  
Ekta Chaturvedi

In this paper we propose multiple cameras using real time tracking for surveillance and security system. It is extensively used in the research field of computer vision applications, like that video surveillance, authentication systems, robotics, pre-stage of MPEG4 image compression and user inter faces by gestures. The key components of tracking for surveillance system are extracting the feature, background subtraction and identification of extracted object. Video surveillance, object detection and tracking have drawn a successful increased interest in recent years. A object tracking can be understood as the problem of finding the path (i.e. trajectory) and it can be defined as a procedure to identify the different positions of the object in each frame of a video. Based on the previous work on single detection using single stationary camera, we extend the concept to enable the tracking of multiple object detection under multiple camera and also maintain a security based system by multiple camera to track person in indoor environment, to identify by my proposal system which consist of multiple camera to monitor a person. Present study mainly aims to provide security and detect the moving object in real time video sequences and live video streaming. Based on a robust algorithm for human body detection and tracking in videos created with support of multiple cameras.





2021 ◽  
Vol 13 (12) ◽  
pp. 2417
Author(s):  
Savvas Karatsiolis ◽  
Andreas Kamilaris ◽  
Ian Cole

Estimating the height of buildings and vegetation in single aerial images is a challenging problem. A task-focused Deep Learning (DL) model that combines architectural features from successful DL models (U-NET and Residual Networks) and learns the mapping from a single aerial imagery to a normalized Digital Surface Model (nDSM) was proposed. The model was trained on aerial images whose corresponding DSM and Digital Terrain Models (DTM) were available and was then used to infer the nDSM of images with no elevation information. The model was evaluated with a dataset covering a large area of Manchester, UK, as well as the 2018 IEEE GRSS Data Fusion Contest LiDAR dataset. The results suggest that the proposed DL architecture is suitable for the task and surpasses other state-of-the-art DL approaches by a large margin.



Sign in / Sign up

Export Citation Format

Share Document