scholarly journals Survey and Performance Analysis of Object Detection in Challenging Environments

Author(s):  
Muhammad Ahmed ◽  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
...  

Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of object detection in challenging environments. However, there is no consolidated reference to cover state-of-the-art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present the quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.

Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5116
Author(s):  
Muhammad Ahmed ◽  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
...  

Recent progress in deep learning has led to accurate and efficient generic object detection networks. Training of highly reliable models depends on large datasets with highly textured and rich images. However, in real-world scenarios, the performance of the generic object detection system decreases when (i) occlusions hide the objects, (ii) objects are present in low-light images, or (iii) they are merged with background information. In this paper, we refer to all these situations as challenging environments. With the recent rapid development in generic object detection algorithms, notable progress has been observed in the field of deep learning-based object detection in challenging environments. However, there is no consolidated reference to cover the state of the art in this domain. To the best of our knowledge, this paper presents the first comprehensive overview, covering recent approaches that have tackled the problem of object detection in challenging environments. Furthermore, we present a quantitative and qualitative performance analysis of these approaches and discuss the currently available challenging datasets. Moreover, this paper investigates the performance of current state-of-the-art generic object detection algorithms by benchmarking results on the three well-known challenging datasets. Finally, we highlight several current shortcomings and outline future directions.


2021 ◽  
Vol 11 (11) ◽  
pp. 4894
Author(s):  
Anna Scius-Bertrand ◽  
Michael Jungo ◽  
Beat Wolf ◽  
Andreas Fischer ◽  
Marc Bui

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.


2021 ◽  
Vol 11 (23) ◽  
pp. 11241
Author(s):  
Ling Li ◽  
Fei Xue ◽  
Dong Liang ◽  
Xiaofei Chen

Concealed objects detection in terahertz imaging is an urgent need for public security and counter-terrorism. So far, there is no public terahertz imaging dataset for the evaluation of objects detection algorithms. This paper provides a public dataset for evaluating multi-object detection algorithms in active terahertz imaging. Due to high sample similarity and poor imaging quality, object detection on this dataset is much more difficult than on those commonly used public object detection datasets in the computer vision field. Since the traditional hard example mining approach is designed based on the two-stage detector and cannot be directly applied to the one-stage detector, this paper designs an image-based Hard Example Mining (HEM) scheme based on RetinaNet. Several state-of-the-art detectors, including YOLOv3, YOLOv4, FRCN-OHEM, and RetinaNet, are evaluated on this dataset. Experimental results show that the RetinaNet achieves the best mAP and HEM further enhances the performance of the model. The parameters affecting the detection metrics of individual images are summarized and analyzed in the experiments.


2020 ◽  
Author(s):  
Andrey De Aguiar Salvi ◽  
Rodrigo Coelho Barros

Recent research on Convolutional Neural Networks focuses on how to create models with a reduced number of parameters and a smaller storage size while keeping the model’s ability to perform its task, allowing the use of the best CNN for automating tasks in limited devices, with reduced processing power, memory, or energy consumption constraints. There are many different approaches in the literature: removing parameters, reduction of the floating-point precision, creating smaller models that mimic larger models, neural architecture search (NAS), etc. With all those possibilities, it is challenging to say which approach provides a better trade-off between model reduction and performance, due to the difference between the approaches, their respective models, the benchmark datasets, or variations in training details. Therefore, this article contributes to the literature by comparing three state-of-the-art model compression approaches to reduce a well-known convolutional approach for object detection, namely YOLOv3. Our experimental analysis shows that it is possible to create a reduced version of YOLOv3 with 90% fewer parameters and still outperform the original model by pruning parameters. We also create models that require only 0.43% of the original model’s inference effort.


2020 ◽  
Vol 34 (07) ◽  
pp. 10778-10785
Author(s):  
Linpu Fang ◽  
Hang Xu ◽  
Zhili Liu ◽  
Sarah Parisot ◽  
Zhenguo Li

Object detectors trained on fully-annotated data currently yield state of the art performance but require expensive manual annotations. On the other hand, weakly-supervised detectors have much lower performance and cannot be used reliably in a realistic setting. In this paper, we study the hybrid-supervised object detection problem, aiming to train a high quality detector with only a limited amount of fully-annotated data and fully exploiting cheap data with image-level labels. State of the art methods typically propose an iterative approach, alternating between generating pseudo-labels and updating a detector. This paradigm requires careful manual hyper-parameter tuning for mining good pseudo labels at each round and is quite time-consuming. To address these issues, we present EHSOD, an end-to-end hybrid-supervised object detection system which can be trained in one shot on both fully and weakly-annotated data. Specifically, based on a two-stage detector, we proposed two modules to fully utilize the information from both kinds of labels: 1) CAM-RPN module aims at finding foreground proposals guided by a class activation heat-map; 2) hybrid-supervised cascade module further refines the bounding-box position and classification with the help of an auxiliary head compatible with image-level data. Extensive experiments demonstrate the effectiveness of the proposed method and it achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data, e.g. 37.5% mAP on COCO. We will release the code and the trained models.


2020 ◽  
Author(s):  
Fetulhak Abdurahman ◽  
Kinde Fante Anlay ◽  
Mohammed Aliy

Abstract Background Information: Manual microscopic examination is still the "golden standard" for malaria diagnosis. The challenge in the manual microscopy is the fact that its accuracy, consistency and speed of diagnosis depends on the skill of the laboratory technician. It is difficult to get highly skilled laboratory technicians in the remote areas of developing countries. In order to alleviate this problem, in this paper, we propose and investigate the state-of-the-art one-stage and two-stage object detection algorithms for automated malaria parasite screening from thick blood slides. Methods: YOLOV3 and YOLOV4 are state-of-the-art object detectors both in terms of accuracy and speed; however, they are not optimized for the detection of small objects such as malaria parasite in microscopic images. To deal with these challenges, we have modified YOLOV3 and YOLOV4 models by increasing the feature scale and by adding more detection layers, without notably decreasing their detection speed. We have proposed one modified YOLOV4 model, called YOLOV4-MOD and two modified models for YOLOV3, which are called YOLOV3-MOD1 and YOLOV3-MOD2. In addition, we have generated new anchor box scales and sizes by using the K-means clustering algorithm to exploit small object detection learning ability of the models.Results: The proposed modified YOLOV3 and YOLOV4 algorithms are evaluated on publicly available malaria dataset and achieve state-of-the-art accuracy by exceeding the performance of their original versions, Faster R-CNN and SSD in terms of mean average precision (mAP), recall, precision, F1 score, and average IOU. For 608 x 608 input resolution YOLOV4-MOD achieves the best detection performance among all the other models with mAP of 96.32%. For the same input resolution YOLOV3-MOD2 and YOLOV3-MOD1 achieved mAP of 96.14% and 95.46% respectively. Conclusions: Th experimental results demonstrate that the performance of the proposed modified YOLOV3 and YOLOV4 models are reliable to be applied for detection of malaria parasite from images that can be captured by smartphone camera over the microscope eyepiece. The proposed system can be easily deployed in low-resource setting and it can save lives.


2020 ◽  
Vol 28 (S2) ◽  
Author(s):  
Asmida Ismail ◽  
Siti Anom Ahmad ◽  
Azura Che Soh ◽  
Mohd Khair Hassan ◽  
Hazreen Haizi Harith

The object detection system is a computer technology related to image processing and computer vision that detects instances of semantic objects of a certain class in digital images and videos. The system consists of two main processes, which are classification and detection. Once an object instance has been classified and detected, it is possible to obtain further information, including recognizes the specific instance, track the object over an image sequence and extract further information about the object and the scene. This paper presented an analysis performance of deep learning object detector by combining a deep learning Convolutional Neural Network (CNN) for object classification and applies classic object detection algorithms to devise our own deep learning object detector. MiniVGGNet is an architecture network used to train an object classification, and the data used for this purpose was collected from specific indoor environment building. For object detection, sliding windows and image pyramids were used to localize and detect objects at different locations, and non-maxima suppression (NMS) was used to obtain the final bounding box to localize the object location. Based on the experiment result, the percentage of classification accuracy of the network is 80% to 90% and the time for the system to detect the object is less than 15sec/frame. Experimental results show that there are reasonable and efficient to combine classic object detection method with a deep learning classification approach. The performance of this method can work in some specific use cases and effectively solving the problem of the inaccurate classification and detection of typical features.


2020 ◽  
Vol 34 (07) ◽  
pp. 12460-12467
Author(s):  
Liang Xie ◽  
Chao Xiang ◽  
Zhengxu Yu ◽  
Guodong Xu ◽  
Zheng Yang ◽  
...  

LIDAR point clouds and RGB-images are both extremely essential for 3D object detection. So many state-of-the-art 3D detection algorithms dedicate in fusing these two types of data effectively. However, their fusion methods based on Bird's Eye View (BEV) or voxel format are not accurate. In this paper, we propose a novel fusion approach named Point-based Attentive Cont-conv Fusion(PACF) module, which fuses multi-sensor features directly on 3D points. Except for continuous convolution, we additionally add a Point-Pooling and an Attentive Aggregation to make the fused features more expressive. Moreover, based on the PACF module, we propose a 3D multi-sensor multi-task network called Pointcloud-Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks. PI-RCNN employs a segmentation sub-network to extract full-resolution semantic feature maps from images and then fuses the multi-sensor features via powerful PACF module. Beneficial from the effectiveness of the PACF module and the expressive semantic features from the segmentation module, PI-RCNN can improve much in 3D object detection. We demonstrate the effectiveness of the PACF module and PI-RCNN on the KITTI 3D Detection benchmark, and our method can achieve state-of-the-art on the metric of 3D AP.


Sign in / Sign up

Export Citation Format

Share Document