scholarly journals A Practice for Object Detection Using YOLO Algorithm

Author(s):  
Dr. Suwarna Gothane

When we look at images or videos, we can easily locate and identify the objects of our interest within moments. Passing on this intelligence to computers is nothing but object detection - locating the object and identifying it. Object Detection has found its application in a wide variety of domains such as video surveillance, image retrieval systems, autonomous driving vehicles and many more. Various algorithms can be used for object detection but we will be focusing on the YoloV3 algorithm. YOLO stands for "You Only Look Once". The YOLO model is very accurate and allows us to detect the objects present in the frame. YOLO follows a completely different approach. Instead of selecting some regions, it applies a neural network to the entire image to predict bounding boxes and their probabilities. YOLO is a single deep convolutional neural network that splits the input image into a set of grid cells, so unlike image classification or face detection, each grid cell in YOLO algorithm will have an associated vector in the output that tells us if an object exists in that grid cell, the class of that object, the predicted bounding box for that object. The model here is progressive so it learns more over time, increasing its prediction accuracy over time. The way the model works is that it makes many predictions in one frame and decides to use the most accurate prediction, thus discarding the other. The predictions are made randomly, so if the model feels like there is an object in the frame which is of a very small pixel it will take that also into consideration. To make it more precise and clearer, the model simply creates bounding boxes around everything in the frame, it would make predictions for each box and pick the one with the most confidence score. All this is done in a small-time frame, thus showing why this specific model is the best to use in a real time situation.

2021 ◽  
Vol 11 (13) ◽  
pp. 6016
Author(s):  
Jinsoo Kim ◽  
Jeongho Cho

For autonomous vehicles, it is critical to be aware of the driving environment to avoid collisions and drive safely. The recent evolution of convolutional neural networks has contributed significantly to accelerating the development of object detection techniques that enable autonomous vehicles to handle rapid changes in various driving environments. However, collisions in an autonomous driving environment can still occur due to undetected obstacles and various perception problems, particularly occlusion. Thus, we propose a robust object detection algorithm for environments in which objects are truncated or occluded by employing RGB image and light detection and ranging (LiDAR) bird’s eye view (BEV) representations. This structure combines independent detection results obtained in parallel through “you only look once” networks using an RGB image and a height map converted from the BEV representations of LiDAR’s point cloud data (PCD). The region proposal of an object is determined via non-maximum suppression, which suppresses the bounding boxes of adjacent regions. A performance evaluation of the proposed scheme was performed using the KITTI vision benchmark suite dataset. The results demonstrate the detection accuracy in the case of integration of PCD BEV representations is superior to when only an RGB camera is used. In addition, robustness is improved by significantly enhancing detection accuracy even when the target objects are partially occluded when viewed from the front, which demonstrates that the proposed algorithm outperforms the conventional RGB-based model.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Kaliappan Madasamy ◽  
Vimal Shanmuganathan ◽  
Vijayalakshmi Kandasamy ◽  
Mi Young Lee ◽  
Manikandan Thangadurai

AbstractComputer vision is an interdisciplinary domain for object detection. Object detection relay is a vital part in assisting surveillance, vehicle detection and pose estimation. In this work, we proposed a novel deep you only look once (deep YOLO V3) approach to detect the multi-object. This approach looks at the entire frame during the training and test phase. It followed a regression-based technique that used a probabilistic model to locate objects. In this, we construct 106 convolution layers followed by 2 fully connected layers and 812 × 812 × 3 input size to detect the drones with small size. We pre-train the convolution layers for classification at half the resolution and then double the resolution for detection. The number of filters of each layer will be set to 16. The number of filters of the last scale layer is more than 16 to improve the small object detection. This construction uses up-sampling techniques to improve undesired spectral images into the existing signal and rescaling the features in specific locations. It clearly reveals that the up-sampling detects small objects. It actually improves the sampling rate. This YOLO architecture is preferred because it considers less memory resource and computation cost rather than more number of filters. The proposed system is designed and trained to perform a single type of class called drone and the object detection and tracking is performed with the embedded system-based deep YOLO. The proposed YOLO approach predicts the multiple bounding boxes per grid cell with better accuracy. The proposed model has been trained with a large number of small drones with different conditions like open field, and marine environment with complex background.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Sultan S. Alshamrani ◽  
Nishant Jha ◽  
Deepak Prashar

Recently, 5G and beyond 5G (B5G) systems, Ultrareliable Low Latency Network (URLLC) represents the key enabler for a range of modern technologies to support Industry 4.0 applications, such as transportation and healthcare. Real-world implementation of URLLC can help in major transformations in industries like autonomous driving, road safety, and efficient traffic management. Furthermore, URLLC contributes to the objective of fully autonomous cars on the road that can respond to dynamic traffic patterns by collaborating with other vehicles and surrounding environments rather than relying solely on local data. For this, the main necessity is that how information is to be transferred among the vehicles in a very small time frame. This requires information to be transferred among the vehicles reliably in extremely short time duration. In this paper, we have implemented and analyzed the Multiaccess Edge Computing- (MEC-) based architecture for 5G autonomous vehicles based on baseband units (BBU). We have performed Monte Carlo simulations and plotted curves of propagation latency, handling latency, and total latency in terms of vehicle density. We have also plotted the reliability curve to double-check our findings. When the RSU density is constant, the propagation latency is directly proportional to the vehicle density, but when the vehicle density is fixed, the propagation latency is inversely proportional. When RSU density is constant, vehicle density and handling latency are strictly proportional, but when vehicle density is fixed, handling latency becomes inversely proportional. Total latency behaves similarly to propagation latency; that is, it is also directly proportional.


2020 ◽  
Vol 34 (07) ◽  
pp. 12557-12564 ◽  
Author(s):  
Zhenbo Xu ◽  
Wei Zhang ◽  
Xiaoqing Ye ◽  
Xiao Tan ◽  
Wei Yang ◽  
...  

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.


2021 ◽  
Vol 12 (1) ◽  
pp. 281
Author(s):  
Jaesung Jang ◽  
Hyeongyu Lee ◽  
Jong-Chan Kim

For safe autonomous driving, deep neural network (DNN)-based perception systems play essential roles, where a vast amount of driving images should be manually collected and labeled with ground truth (GT) for training and validation purposes. After observing the manual GT generation’s high cost and unavoidable human errors, this study presents an open-source automatic GT generation tool, CarFree, based on the Carla autonomous driving simulator. By that, we aim to democratize the daunting task of (in particular) object detection dataset generation, which was only possible by big companies or institutes due to its high cost. CarFree comprises (i) a data extraction client that automatically collects relevant information from the Carla simulator’s server and (ii) a post-processing software that produces precise 2D bounding boxes of vehicles and pedestrians on the gathered driving images. Our evaluation results show that CarFree can generate a considerable amount of realistic driving images along with their GTs in a reasonable time. Moreover, using the synthesized training images with artificially made unusual weather and lighting conditions, which are difficult to obtain in real-world driving scenarios, CarFree significantly improves the object detection accuracy in the real world, particularly in the case of harsh environments. With CarFree, we expect its users to generate a variety of object detection datasets in hassle-free ways.


2021 ◽  
Vol 11 (24) ◽  
pp. 11630
Author(s):  
Yan Zhou ◽  
Sijie Wen ◽  
Dongli Wang ◽  
Jinzhen Mu ◽  
Irampaye Richard

Object detection is one of the key algorithms in automatic driving systems. Aiming at addressing the problem of false detection and the missed detection of both small and occluded objects in automatic driving scenarios, an improved Faster-RCNN object detection algorithm is proposed. First, deformable convolution and a spatial attention mechanism are used to improve the ResNet-50 backbone network to enhance the feature extraction of small objects; then, an improved feature pyramid structure is introduced to reduce the loss of features in the fusion process. Three cascade detectors are introduced to solve the problem of IOU (Intersection-Over-Union) threshold mismatch, and side-aware boundary localization is applied for frame regression. Finally, Soft-NMS (Soft Non-maximum Suppression) is used to remove bounding boxes to obtain the best results. The experimental results show that the improved Faster-RCNN can better detect small objects and occluded objects, and its accuracy is 7.7% and 4.1% respectively higher than that of the baseline in the eight categories selected from the COCO2017 and BDD100k data sets.


Author(s):  
Akash Kumar, Dr. Amita Goel Prof. Vasudha Bahl and Prof. Nidhi Sengar

Object Detection is a study in the field of computer vision. An object detection model recognizes objects of the real world present either in a captured image or in real-time video where the object can belong to any class of objects namely humans, animals, objects, etc. This project is an implementation of an algorithm based on object detection called You Only Look Once (YOLO v3). The architecture of yolo model is extremely fast compared to all previous methods. Yolov3 model executes a single neural network to the given image and then divides the image into predetermined bounding boxes. These boxes are weighted by the predicted probabilities. After non max-suppression it gives the result of recognized objects together with bounding boxes. Yolo trains and directly executes object detection on full images.


Author(s):  
N. Kozonek ◽  
N. Zeller ◽  
H. Bock ◽  
M. Pfeifle

<p><strong>Abstract.</strong> In this paper we present a sensor fusion framework for the detection and classification of objects in autonomous driving applications. The presented method uses a state-of-the-art convolutional neural network (CNN) to detect and classify object from RGB images. The 2D bounding boxes calculated by the CNN are fused with the 3D point cloud measured by Lidar sensors. An accurate sensor cross-calibration is used to map the Lidar points into the image, where they are assigned to the 2D bounding boxes. A one-dimensional K-means algorithm is applied to separate object points from foreground and background and to calculated accurate 3D centroids for all detected objects. The proposed algorithm is tested based on real world data and shows a stable and reliable object detection and centroid estimation in different kind of situations.</p>


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2903
Author(s):  
Razvan Bocu ◽  
Dorin Bocu ◽  
Maksim Iavich

The relatively complex task of detecting 3D objects is essential in the realm of autonomous driving. The related algorithmic processes generally produce an output that consists of a series of 3D bounding boxes that are placed around specific objects of interest. The related scientific literature usually suggests that the data that are generated by different sensors or data acquisition devices are combined in order to work around inherent limitations that are determined by the consideration of singular devices. Nevertheless, there are practical issues that cannot be addressed reliably and efficiently through this strategy, such as the limited field-of-view, and the low-point density of acquired data. This paper reports a contribution that analyzes the possibility of efficiently and effectively using 3D object detection in a cooperative fashion. The evaluation of the described approach is performed through the consideration of driving data that is collected through a partnership with several car manufacturers. Considering their real-world relevance, two driving contexts are analyzed: a roundabout, and a T-junction. The evaluation shows that cooperative perception is able to isolate more than 90% of the 3D entities, as compared to approximately 25% in the case when singular sensing devices are used. The experimental setup that generated the data that this paper describes, and the related 3D object detection system, are currently actively used by the respective car manufacturers’ research groups in order to fine tune and improve their autonomous cars’ driving modules.


Informatics ◽  
2020 ◽  
Vol 17 (2) ◽  
pp. 7-16
Author(s):  
R. P. Bohush ◽  
I. Yu. Zakharava ◽  
S. V. Ablameyko

In the paper the algorithm for object detection in high resolution images is proposed. The approach uses multiscale image representation followed by block processing with the overlapping value. For each block the object detection with convolutional neural network was performed. Number of pyramid layers is limited by the Convolutional Neural Network layer size and input image resolution. Overlapping blocks splitting to improve the classification and detection accuracy is performed on each layer of pyramid except the highest one. Detected areas are merged into one if they have high overlapping value and the same class. Experimental results for the algorithm are presented in the paper.


Sign in / Sign up

Export Citation Format

Share Document