bounding boxes
Recently Published Documents


TOTAL DOCUMENTS

395
(FIVE YEARS 287)

H-INDEX

14
(FIVE YEARS 8)

Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 148
Author(s):  
Nikita Andriyanov ◽  
Ilshat Khasanshin ◽  
Daniil Utkin ◽  
Timur Gataullin ◽  
Stefan Ignar ◽  
...  

Despite the great possibilities of modern neural network architectures concerning the problems of object detection and recognition, the output of such models is the local (pixel) coordinates of objects bounding boxes in the image and their predicted classes. However, in several practical tasks, it is necessary to obtain more complete information about the object from the image. In particular, for robotic apple picking, it is necessary to clearly understand where and how much to move the grabber. To determine the real position of the apple relative to the source of image registration, it is proposed to use the Intel Real Sense depth camera and aggregate information from its depth and brightness channels. The apples detection is carried out using the YOLOv3 architecture; then, based on the distance to the object and its localization in the image, the relative distances are calculated for all coordinates. In this case, to determine the coordinates of apples, a transition to a symmetric coordinate system takes place by means of simple linear transformations. Estimating the position in a symmetric coordinate system allows estimating not only the magnitude of the shift but also the location of the object relative to the camera. The proposed approach makes it possible to obtain position estimates with high accuracy. The approximate root mean square error is 7–12 mm, depending on the range and axis. As for precision and recall metrics, the first is 100% and the second is 90%.


Author(s):  
Jiajia Liao ◽  
Yujun Liu ◽  
Yingchao Piao ◽  
Jinhe Su ◽  
Guorong Cai ◽  
...  

AbstractRecent advances in camera-equipped drone applications increased the demand for visual object detection algorithms with deep learning for aerial images. There are several limitations in accuracy for a single deep learning model. Inspired by ensemble learning can significantly improve the generalization ability of the model in the machine learning field, we introduce a novel integration strategy to combine the inference results of two different methods without non-maximum suppression. In this paper, a global and local ensemble network (GLE-Net) was proposed to increase the quality of predictions by considering the global weights for different models and adjusting the local weights for bounding boxes. Specifically, the global module assigns different weights to models. In the local module, we group the bounding boxes that corresponding to the same object as a cluster. Each cluster generates a final predict box and assigns the highest score in the cluster as the score of the final predict box. Experiments on benchmarks VisDrone2019 show promising performance of GLE-Net compared with the baseline network.


Sensors ◽  
2022 ◽  
Vol 22 (1) ◽  
pp. 319
Author(s):  
Xin Chen ◽  
Jinghong Liu ◽  
Fang Xu ◽  
Zhihua Xie ◽  
Yujia Zuo ◽  
...  

Aircraft detection in remote sensing images (RSIs) has drawn widespread attention in recent years, which has been widely used in the military and civilian fields. While the complex background, variations of aircraft pose and size bring great difficulties to the effective detection. In this paper, we propose a novel aircraft target detection scheme based on small training samples. The scheme is coarse-to-fine, which consists of two main stages: region proposal and target identification. First, in the region proposal stage, a circular intensity filter, which is designed based on the characteristics of the aircraft target, can quickly locate the centers of multi-scale suspicious aircraft targets in the RSIs pyramid. Then the target regions can be extracted by adding bounding boxes. This step can get high-quality but few candidate regions. Second, in the stage of target identification, we proposed a novel rotation-invariant feature, which combines rotation-invariant histogram of oriented gradient and vector of locally aggregated descriptors (VLAD). The feature can characterize the aircraft target well by avoiding the impact of its rotation and can be effectively used to remove false alarms. Experiments are conducted on Remote Sensing Object Detection (RSOD) dataset to compare the proposed method with other advanced methods. The results show that the proposed method can quickly and accurately detect aircraft targets in RSIs and achieve a better performance.


2021 ◽  
Vol 12 (1) ◽  
pp. 281
Author(s):  
Jaesung Jang ◽  
Hyeongyu Lee ◽  
Jong-Chan Kim

For safe autonomous driving, deep neural network (DNN)-based perception systems play essential roles, where a vast amount of driving images should be manually collected and labeled with ground truth (GT) for training and validation purposes. After observing the manual GT generation’s high cost and unavoidable human errors, this study presents an open-source automatic GT generation tool, CarFree, based on the Carla autonomous driving simulator. By that, we aim to democratize the daunting task of (in particular) object detection dataset generation, which was only possible by big companies or institutes due to its high cost. CarFree comprises (i) a data extraction client that automatically collects relevant information from the Carla simulator’s server and (ii) a post-processing software that produces precise 2D bounding boxes of vehicles and pedestrians on the gathered driving images. Our evaluation results show that CarFree can generate a considerable amount of realistic driving images along with their GTs in a reasonable time. Moreover, using the synthesized training images with artificially made unusual weather and lighting conditions, which are difficult to obtain in real-world driving scenarios, CarFree significantly improves the object detection accuracy in the real world, particularly in the case of harsh environments. With CarFree, we expect its users to generate a variety of object detection datasets in hassle-free ways.


Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 76
Author(s):  
Jongsub Yu ◽  
Hyukdoo Choi

This paper presents an object detector with depth estimation using monocular camera images. Previous detection studies have typically focused on detecting objects with 2D or 3D bounding boxes. A 3D bounding box consists of the center point, its size parameters, and heading information. However, predicting complex output compositions leads a model to have generally low performances, and it is not necessary for risk assessment for autonomous driving. We focused on predicting a single depth per object, which is essential for risk assessment for autonomous driving. Our network architecture is based on YOLO v4, which is a fast and accurate one-stage object detector. We added an additional channel to the output layer for depth estimation. To train depth prediction, we extract the closest depth from the 3D bounding box coordinates of ground truth labels in the dataset. Our model is compared with the latest studies on 3D object detection using the KITTI object detection benchmark. As a result, we show that our model achieves higher detection performance and detection speed than existing models with comparable depth accuracy.


2021 ◽  
Vol 163 (1) ◽  
pp. 23
Author(s):  
Kaiming Cui ◽  
Junjie Liu ◽  
Fabo Feng ◽  
Jifeng Liu

Abstract Deep learning techniques have been well explored in the transiting exoplanet field; however, previous work mainly focuses on classification and inspection. In this work, we develop a novel detection algorithm based on a well-proven object detection framework in the computer vision field. Through training the network on the light curves of the confirmed Kepler exoplanets, our model yields about 90% precision and recall for identifying transits with signal-to-noise ratio higher than 6 (set the confidence threshold to 0.6). Giving a slightly lower confidence threshold, recall can reach higher than 95%. We also transfer the trained model to the TESS data and obtain similar performance. The results of our algorithm match the intuition of the human visual perception and make it useful to find single-transiting candidates. Moreover, the parameters of the output bounding boxes can also help to find multiplanet systems. Our network and detection functions are implemented in the Deep-Transit toolkit, which is an open-source Python package hosted on Github and PyPI.


Electronics ◽  
2021 ◽  
Vol 10 (24) ◽  
pp. 3092
Author(s):  
Yonghui Liang ◽  
Yuqing He ◽  
Junkai Yang ◽  
Weiqi Jin ◽  
Mingqi Liu

Accurate localization of surrounding vehicles helps drivers to perceive surrounding environment, which can be obtained by two parameters: depth and direction angle. This research aims to present a new efficient monocular vision based pipeline to get the vehicle’s location. We proposed a plug-and-play convolutional block combination with a basic target detection algorithm to improve the accuracy of vehicle’s bounding boxes. Then they were transformed to actual depth and angle through a conversion method which was deduced by monocular imaging geometry and camera parameters. Experimental results on KITTI dataset showed the high accuracy and efficiency of the proposed method. The mAP increased by about 2% with an additional inference time of less than 5 ms. The average depth error was about 4% for near distance objects and about 7% for far distance objects. The average angle error was about two degrees.


2021 ◽  
Vol 13 (24) ◽  
pp. 4999
Author(s):  
Boyong He ◽  
Xianjiang Li ◽  
Bo Huang ◽  
Enhui Gu ◽  
Weijie Guo ◽  
...  

As a data-driven approach, deep learning requires a large amount of annotated data for training to obtain a sufficiently accurate and generalized model, especially in the field of computer vision. However, when compared with generic object recognition datasets, aerial image datasets are more challenging to acquire and more expensive to label. Obtaining a large amount of high-quality aerial image data for object recognition and image understanding is an urgent problem. Existing studies show that synthetic data can effectively reduce the amount of training data required. Therefore, in this paper, we propose the first synthetic aerial image dataset for ship recognition, called UnityShip. This dataset contains over 100,000 synthetic images and 194,054 ship instances, including 79 different ship models in ten categories and six different large virtual scenes with different time periods, weather environments, and altitudes. The annotations include environmental information, instance-level horizontal bounding boxes, oriented bounding boxes, and the type and ID of each ship. This provides the basis for object detection, oriented object detection, fine-grained recognition, and scene recognition. To investigate the applications of UnityShip, the synthetic data were validated for model pre-training and data augmentation using three different object detection algorithms and six existing real-world ship detection datasets. Our experimental results show that for small-sized and medium-sized real-world datasets, the synthetic data achieve an improvement in model pre-training and data augmentation, showing the value and potential of synthetic data in aerial image recognition and understanding tasks.


2021 ◽  
Author(s):  
Prashanth Pillai ◽  
Purnaprajna Mangsuli

Abstract In the O&G (Oil & Gas) industry, unstructured data sources such as technical reports on hydrocarbon production, daily drilling, well construction, etc. contain valuable information. This information however is conveyed through various formats such as tables, forms, text, figures, etc. Detecting these different entities in documents is essential for building a structured representation of the information within and for automated processing of documents at scale. Our work presents a document layout analysis workflow to detect/localize different entities based on a deep learning-based framework. The workflow comprises of a deep learning-based object-detection framework based on transformers to identify the spatial location of entities in a document page. The key elements of the object-detection pipeline include a residual network backbone for feature extraction and an encoder-decoder transformer based on the latest detection transformers (DETR) to predict object-bounding boxes and category labels. The object detection is formulated as a direct set prediction task using bipartite matching while also eliminating conventional operations like anchor box generation and non-maximal suppression. The availability of sufficient publicly available document layout data sets that incorporate the artifacts observed in historical O&G technical reports is often a major challenge. We attempt to address this challenge by using a novel training data augmentation methodology. The dense occurrence of elements in a page can often introduce uncertainties resulting in bounding boxes cutting through text content. We adopt a bounding box post-processing methodology to refine the bounding box coordinates to minimize undercuts. The proposed document layout analysis pipeline was trained to detect entity types such as headings, text blocks, tables, forms, and images/charts in a document page. A wide range of pages from lithology, stratigraphy, drilling, and field development reports were used for model training. The reports also included a considerable number of historical scanned reports. The trained object-detection model was evaluated on a test data set prepared from the O&G reports. DETR demonstrated superior performance when compared with the Mask R-CNN on our dataset.


Sign in / Sign up

Export Citation Format

Share Document