bounding boxes Latest Research Papers

Intelligent System for Estimation of the Spatial Position of Apples Based on YOLOv3 and Real Sense Depth Camera D415

Symmetry ◽

10.3390/sym14010148 ◽

2022 ◽

Vol 14 (1) ◽

pp. 148

Author(s):

Nikita Andriyanov ◽

Ilshat Khasanshin ◽

Daniil Utkin ◽

Timur Gataullin ◽

Stefan Ignar ◽

...

Keyword(s):

Coordinate System ◽

Intelligent System ◽

Complete Information ◽

Depth Camera ◽

Network Architectures ◽

Linear Transformations ◽

Aggregate Information ◽

Bounding Boxes ◽

Object Relative ◽

Detection And Recognition

Despite the great possibilities of modern neural network architectures concerning the problems of object detection and recognition, the output of such models is the local (pixel) coordinates of objects bounding boxes in the image and their predicted classes. However, in several practical tasks, it is necessary to obtain more complete information about the object from the image. In particular, for robotic apple picking, it is necessary to clearly understand where and how much to move the grabber. To determine the real position of the apple relative to the source of image registration, it is proposed to use the Intel Real Sense depth camera and aggregate information from its depth and brightness channels. The apples detection is carried out using the YOLOv3 architecture; then, based on the distance to the object and its localization in the image, the relative distances are calculated for all coordinates. In this case, to determine the coordinates of apples, a transition to a symmetric coordinate system takes place by means of simple linear transformations. Estimating the position in a symmetric coordinate system allows estimating not only the magnitude of the shift but also the location of the object relative to the camera. The proposed approach makes it possible to obtain position estimates with high accuracy. The approximate root mean square error is 7–12 mm, depending on the range and axis. As for precision and recall metrics, the first is 100% and the second is 90%.

GLE-Net: A Global and Local Ensemble Network for Aerial Object Detection

International Journal of Computational Intelligence Systems ◽

10.1007/s44196-021-00056-3 ◽

2022 ◽

Vol 15 (1) ◽

Author(s):

Jiajia Liao ◽

Yujun Liu ◽

Yingchao Piao ◽

Jinhe Su ◽

Guorong Cai ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Aerial Images ◽

Visual Object ◽

Detection Algorithms ◽

Integration Strategy ◽

Bounding Boxes ◽

Global And Local ◽

Deep Learning Model

AbstractRecent advances in camera-equipped drone applications increased the demand for visual object detection algorithms with deep learning for aerial images. There are several limitations in accuracy for a single deep learning model. Inspired by ensemble learning can significantly improve the generalization ability of the model in the machine learning field, we introduce a novel integration strategy to combine the inference results of two different methods without non-maximum suppression. In this paper, a global and local ensemble network (GLE-Net) was proposed to increase the quality of predictions by considering the global weights for different models and adjusting the local weights for bounding boxes. Specifically, the global module assigns different weights to models. In the local module, we group the bounding boxes that corresponding to the same object as a cluster. Each cluster generates a final predict box and assigns the highest score in the cluster as the score of the final predict box. Experiments on benchmarks VisDrone2019 show promising performance of GLE-Net compared with the baseline network.

Detecting Rocks in Challenging Mining Environments using Convolutional Neural Networks and Ellipses as an alternative to Bounding Boxes

Expert Systems with Applications ◽

10.1016/j.eswa.2022.116537 ◽

2022 ◽

pp. 116537

Author(s):

Patricio Loncomilla ◽

Pavan Samtani ◽

Javier Ruiz-del-Solar

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Bounding Boxes

A Novel Method of Aircraft Detection under Complex Background Based on Circular Intensity Filter and Rotation Invariant Feature

Sensors ◽

10.3390/s22010319 ◽

2022 ◽

Vol 22 (1) ◽

pp. 319

Author(s):

Xin Chen ◽

Jinghong Liu ◽

Fang Xu ◽

Zhihua Xie ◽

Yujia Zuo ◽

...

Keyword(s):

Remote Sensing ◽

Target Identification ◽

False Alarms ◽

Rotation Invariant ◽

Detection Scheme ◽

Invariant Feature ◽

Complex Background ◽

Bounding Boxes ◽

The Impact ◽

Aircraft Detection

Aircraft detection in remote sensing images (RSIs) has drawn widespread attention in recent years, which has been widely used in the military and civilian fields. While the complex background, variations of aircraft pose and size bring great difficulties to the effective detection. In this paper, we propose a novel aircraft target detection scheme based on small training samples. The scheme is coarse-to-fine, which consists of two main stages: region proposal and target identification. First, in the region proposal stage, a circular intensity filter, which is designed based on the characteristics of the aircraft target, can quickly locate the centers of multi-scale suspicious aircraft targets in the RSIs pyramid. Then the target regions can be extracted by adding bounding boxes. This step can get high-quality but few candidate regions. Second, in the stage of target identification, we proposed a novel rotation-invariant feature, which combines rotation-invariant histogram of oriented gradient and vector of locally aggregated descriptors (VLAD). The feature can characterize the aircraft target well by avoiding the impact of its rotation and can be effectively used to remove false alarms. Experiments are conducted on Remote Sensing Object Detection (RSOD) dataset to compare the proposed method with other advanced methods. The results show that the proposed method can quickly and accurately detect aircraft targets in RSIs and achieve a better performance.

CarFree: Hassle-Free Object Detection Dataset Generation Using Carla Autonomous Driving Simulator

Applied Sciences ◽

10.3390/app12010281 ◽

2021 ◽

Vol 12 (1) ◽

pp. 281

Author(s):

Jaesung Jang ◽

Hyeongyu Lee ◽

Jong-Chan Kim

Keyword(s):

Object Detection ◽

Real World ◽

Driving Simulator ◽

Data Extraction ◽

Ground Truth ◽

Autonomous Driving ◽

Relevant Information ◽

Detection Accuracy ◽

Daunting Task ◽

Bounding Boxes

For safe autonomous driving, deep neural network (DNN)-based perception systems play essential roles, where a vast amount of driving images should be manually collected and labeled with ground truth (GT) for training and validation purposes. After observing the manual GT generation’s high cost and unavoidable human errors, this study presents an open-source automatic GT generation tool, CarFree, based on the Carla autonomous driving simulator. By that, we aim to democratize the daunting task of (in particular) object detection dataset generation, which was only possible by big companies or institutes due to its high cost. CarFree comprises (i) a data extraction client that automatically collects relevant information from the Carla simulator’s server and (ii) a post-processing software that produces precise 2D bounding boxes of vehicles and pedestrians on the gathered driving images. Our evaluation results show that CarFree can generate a considerable amount of realistic driving images along with their GTs in a reasonable time. Moreover, using the synthesized training images with artificially made unusual weather and lighting conditions, which are difficult to obtain in real-world driving scenarios, CarFree significantly improves the object detection accuracy in the real world, particularly in the case of harsh environments. With CarFree, we expect its users to generate a variety of object detection datasets in hassle-free ways.

YOLO MDE: Object Detection with Monocular Depth Estimation

Electronics ◽

10.3390/electronics11010076 ◽

2021 ◽

Vol 11 (1) ◽

pp. 76

Author(s):

Jongsub Yu ◽

Hyukdoo Choi

Keyword(s):

Risk Assessment ◽

Object Detection ◽

Network Architecture ◽

Ground Truth ◽

Depth Estimation ◽

Autonomous Driving ◽

Depth Prediction ◽

Bounding Box ◽

Monocular Depth ◽

Bounding Boxes

This paper presents an object detector with depth estimation using monocular camera images. Previous detection studies have typically focused on detecting objects with 2D or 3D bounding boxes. A 3D bounding box consists of the center point, its size parameters, and heading information. However, predicting complex output compositions leads a model to have generally low performances, and it is not necessary for risk assessment for autonomous driving. We focused on predicting a single depth per object, which is essential for risk assessment for autonomous driving. Our network architecture is based on YOLO v4, which is a fast and accurate one-stage object detector. We added an additional channel to the output layer for depth estimation. To train depth prediction, we extract the closest depth from the 3D bounding box coordinates of ground truth labels in the dataset. Our model is compared with the latest studies on 3D object detection using the KITTI object detection benchmark. As a result, we show that our model achieves higher detection performance and detection speed than existing models with comparable depth accuracy.

Identify Light-curve Signals with Deep Learning Based Object Detection Algorithm. I. Transit Detection

The Astronomical Journal ◽

10.3847/1538-3881/ac3482 ◽

2021 ◽

Vol 163 (1) ◽

pp. 23

Author(s):

Kaiming Cui ◽

Junjie Liu ◽

Fabo Feng ◽

Jifeng Liu

Keyword(s):

Deep Learning ◽

Object Detection ◽

Signal To Noise Ratio ◽

Detection Algorithm ◽

Human Visual Perception ◽

Lower Confidence ◽

Learning Techniques ◽

Confidence Threshold ◽

Bounding Boxes ◽

Python Package

Abstract Deep learning techniques have been well explored in the transiting exoplanet field; however, previous work mainly focuses on classification and inspection. In this work, we develop a novel detection algorithm based on a well-proven object detection framework in the computer vision field. Through training the network on the light curves of the confirmed Kepler exoplanets, our model yields about 90% precision and recall for identifying transits with signal-to-noise ratio higher than 6 (set the confidence threshold to 0.6). Giving a slightly lower confidence threshold, recall can reach higher than 95%. We also transfer the trained model to the TESS data and obtain similar performance. The results of our algorithm match the intuition of the human visual perception and make it useful to find single-transiting candidates. Moreover, the parameters of the output bounding boxes can also help to find multiplanet systems. Our network and detection functions are implemented in the Deep-Transit toolkit, which is an open-source Python package hosted on Github and PyPI.

An Efficient Vehicle Localization Method by Using Monocular Vision

Electronics ◽

10.3390/electronics10243092 ◽

2021 ◽

Vol 10 (24) ◽

pp. 3092

Author(s):

Yonghui Liang ◽

Yuqing He ◽

Junkai Yang ◽

Weiqi Jin ◽

Mingqi Liu

Keyword(s):

Detection Algorithm ◽

Monocular Vision ◽

Angle Error ◽

Vehicle Localization ◽

Imaging Geometry ◽

Accurate Localization ◽

Surrounding Environment ◽

Bounding Boxes ◽

Two Parameters ◽

Average Angle

Accurate localization of surrounding vehicles helps drivers to perceive surrounding environment, which can be obtained by two parameters: depth and direction angle. This research aims to present a new efficient monocular vision based pipeline to get the vehicle’s location. We proposed a plug-and-play convolutional block combination with a basic target detection algorithm to improve the accuracy of vehicle’s bounding boxes. Then they were transformed to actual depth and angle through a conversion method which was deduced by monocular imaging geometry and camera parameters. Experimental results on KITTI dataset showed the high accuracy and efficiency of the proposed method. The mAP increased by about 2% with an additional inference time of less than 5 ms. The average depth error was about 4% for near distance objects and about 7% for far distance objects. The average angle error was about two degrees.

UnityShip: A Large-Scale Synthetic Dataset for Ship Recognition in Aerial Images

Remote Sensing ◽

10.3390/rs13244999 ◽

2021 ◽

Vol 13 (24) ◽

pp. 4999

Author(s):

Boyong He ◽

Xianjiang Li ◽

Bo Huang ◽

Enhui Gu ◽

Weijie Guo ◽

...

Keyword(s):

Object Recognition ◽

Object Detection ◽

Real World ◽

Data Augmentation ◽

Synthetic Data ◽

Image Understanding ◽

Scene Recognition ◽

Aerial Image ◽

Bounding Boxes ◽

Ship Recognition

As a data-driven approach, deep learning requires a large amount of annotated data for training to obtain a sufficiently accurate and generalized model, especially in the field of computer vision. However, when compared with generic object recognition datasets, aerial image datasets are more challenging to acquire and more expensive to label. Obtaining a large amount of high-quality aerial image data for object recognition and image understanding is an urgent problem. Existing studies show that synthetic data can effectively reduce the amount of training data required. Therefore, in this paper, we propose the first synthetic aerial image dataset for ship recognition, called UnityShip. This dataset contains over 100,000 synthetic images and 194,054 ship instances, including 79 different ship models in ten categories and six different large virtual scenes with different time periods, weather environments, and altitudes. The annotations include environmental information, instance-level horizontal bounding boxes, oriented bounding boxes, and the type and ID of each ship. This provides the basis for object detection, oriented object detection, fine-grained recognition, and scene recognition. To investigate the applications of UnityShip, the synthetic data were validated for model pre-training and data augmentation using three different object detection algorithms and six existing real-world ship detection datasets. Our experimental results show that for small-sized and medium-sized real-world datasets, the synthetic data achieve an improvement in model pre-training and data augmentation, showing the value and potential of synthetic data in aerial image recognition and understanding tasks.

Document Layout Analysis Using Detection Transformers

10.2118/207266-ms ◽

2021 ◽

Author(s):

Prashanth Pillai ◽

Purnaprajna Mangsuli

Keyword(s):

Deep Learning ◽

Object Detection ◽

Superior Performance ◽

Layout Analysis ◽

Bounding Box ◽

Document Layout Analysis ◽

Wide Range ◽

Document Layout ◽

Bounding Boxes ◽

Document Page

Abstract In the O&G (Oil & Gas) industry, unstructured data sources such as technical reports on hydrocarbon production, daily drilling, well construction, etc. contain valuable information. This information however is conveyed through various formats such as tables, forms, text, figures, etc. Detecting these different entities in documents is essential for building a structured representation of the information within and for automated processing of documents at scale. Our work presents a document layout analysis workflow to detect/localize different entities based on a deep learning-based framework. The workflow comprises of a deep learning-based object-detection framework based on transformers to identify the spatial location of entities in a document page. The key elements of the object-detection pipeline include a residual network backbone for feature extraction and an encoder-decoder transformer based on the latest detection transformers (DETR) to predict object-bounding boxes and category labels. The object detection is formulated as a direct set prediction task using bipartite matching while also eliminating conventional operations like anchor box generation and non-maximal suppression. The availability of sufficient publicly available document layout data sets that incorporate the artifacts observed in historical O&G technical reports is often a major challenge. We attempt to address this challenge by using a novel training data augmentation methodology. The dense occurrence of elements in a page can often introduce uncertainties resulting in bounding boxes cutting through text content. We adopt a bounding box post-processing methodology to refine the bounding box coordinates to minimize undercuts. The proposed document layout analysis pipeline was trained to detect entity types such as headings, text blocks, tables, forms, and images/charts in a document page. A wide range of pages from lithology, stratigraphy, drilling, and field development reports were used for model training. The reports also included a considerable number of historical scanned reports. The trained object-detection model was evaluated on a test data set prepared from the O&G reports. DETR demonstrated superior performance when compared with the Mask R-CNN on our dataset.

bounding boxes
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Intelligent System for Estimation of the Spatial Position of Apples Based on YOLOv3 and Real Sense Depth Camera D415

GLE-Net: A Global and Local Ensemble Network for Aerial Object Detection

Detecting Rocks in Challenging Mining Environments using Convolutional Neural Networks and Ellipses as an alternative to Bounding Boxes

A Novel Method of Aircraft Detection under Complex Background Based on Circular Intensity Filter and Rotation Invariant Feature

CarFree: Hassle-Free Object Detection Dataset Generation Using Carla Autonomous Driving Simulator

YOLO MDE: Object Detection with Monocular Depth Estimation

Identify Light-curve Signals with Deep Learning Based Object Detection Algorithm. I. Transit Detection

An Efficient Vehicle Localization Method by Using Monocular Vision

UnityShip: A Large-Scale Synthetic Dataset for Ship Recognition in Aerial Images

Document Layout Analysis Using Detection Transformers

Export Citation Format

bounding boxesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Intelligent System for Estimation of the Spatial Position of Apples Based on YOLOv3 and Real Sense Depth Camera D415

GLE-Net: A Global and Local Ensemble Network for Aerial Object Detection

Detecting Rocks in Challenging Mining Environments using Convolutional Neural Networks and Ellipses as an alternative to Bounding Boxes

A Novel Method of Aircraft Detection under Complex Background Based on Circular Intensity Filter and Rotation Invariant Feature

CarFree: Hassle-Free Object Detection Dataset Generation Using Carla Autonomous Driving Simulator

YOLO MDE: Object Detection with Monocular Depth Estimation

Identify Light-curve Signals with Deep Learning Based Object Detection Algorithm. I. Transit Detection

An Efficient Vehicle Localization Method by Using Monocular Vision

UnityShip: A Large-Scale Synthetic Dataset for Ship Recognition in Aerial Images

Document Layout Analysis Using Detection Transformers

bounding boxes
Recently Published Documents