Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes

In Earth observation (EO), large-scale land-surface dynamics are traditionally analyzed by investigating aggregated classes. The increase in data with a very high spatial resolution enables investigations on a fine-grained feature level which can help us to better understand the dynamics of land surfaces by taking object dynamics into account. To extract fine-grained features and objects, the most popular deep-learning model for image analysis is commonly used: the convolutional neural network (CNN). In this review, we provide a comprehensive overview of the impact of deep learning on EO applications by reviewing 429 studies on image segmentation and object detection with CNNs. We extensively examine the spatial distribution of study sites, employed sensors, used datasets and CNN architectures, and give a thorough overview of applications in EO which used CNNs. Our main finding is that CNNs are in an advanced transition phase from computer vision to EO. Upon this, we argue that in the near future, investigations which analyze object dynamics with CNNs will have a significant impact on EO research. With a focus on EO applications in this Part II, we complete the methodological review provided in Part I.

Download Full-text

A Lightweight Keypoint-Based Oriented Object Detection of Remote Sensing Images

Remote Sensing ◽

10.3390/rs13132459 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2459

Author(s):

Yangyang Li ◽

Heting Mao ◽

Ruijiao Liu ◽

Xuan Pei ◽

Licheng Jiao ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Large Scale ◽

Detection Methods ◽

Gaussian Kernel ◽

Remote Sensing Images ◽

Computational Overhead ◽

Comparable Performance ◽

Bounding Boxes ◽

Oriented Object

Object detection in remote sensing images has been widely used in military and civilian fields and is a challenging task due to the complex background, large-scale variation, and dense arrangement in arbitrary orientations of objects. In addition, existing object detection methods rely on the increasingly deeper network, which increases a lot of computational overhead and parameters, and is unfavorable to deployment on the edge devices. In this paper, we proposed a lightweight keypoint-based oriented object detector for remote sensing images. First, we propose a semantic transfer block (STB) when merging shallow and deep features, which reduces noise and restores the semantic information. Then, the proposed adaptive Gaussian kernel (AGK) is adapted to objects of different scales, and further improves detection performance. Finally, we propose the distillation loss associated with object detection to obtain a lightweight student network. Experiments on the HRSC2016 and UCAS-AOD datasets show that the proposed method adapts to different scale objects, obtains accurate bounding boxes, and reduces the influence of complex backgrounds. The comparison with mainstream methods proves that our method has comparable performance under lightweight.

Download Full-text

Multi-Scale Feature Integrated Attention-Based Rotation Network for Object Detection in VHR Aerial Images

Sensors ◽

10.3390/s20061686 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1686 ◽

Cited By ~ 3

Author(s):

Feng Yang ◽

Wentong Li ◽

Haiwei Hu ◽

Wanyi Li ◽

Peng Wang

Keyword(s):

Object Detection ◽

Large Scale ◽

Ground Truth ◽

Classification Performance ◽

Aerial Images ◽

Detection Methods ◽

Robust Detection ◽

Scale Feature ◽

Multi Scale ◽

Bounding Boxes

Accurate and robust detection of multi-class objects in very high resolution (VHR) aerial images has been playing a significant role in many real-world applications. The traditional detection methods have made remarkable progresses with horizontal bounding boxes (HBBs) due to CNNs. However, HBB detection methods still exhibit limitations including the missed detection and the redundant detection regions, especially for densely-distributed and strip-like objects. Besides, large scale variations and diverse background also bring in many challenges. Aiming to address these problems, an effective region-based object detection framework named Multi-scale Feature Integration Attention Rotation Network (MFIAR-Net) is proposed for aerial images with oriented bounding boxes (OBBs), which promotes the integration of the inherent multi-scale pyramid features to generate a discriminative feature map. Meanwhile, the double-path feature attention network supervised by the mask information of ground truth is introduced to guide the network to focus on object regions and suppress the irrelevant noise. To boost the rotation regression and classification performance, we present a robust Rotation Detection Network, which can generate efficient OBB representation. Extensive experiments and comprehensive evaluations on two publicly available datasets demonstrate the effectiveness of the proposed framework.

Download Full-text

Object Detection Algorithm Based on Improved YOLOv3

Electronics ◽

10.3390/electronics9030537 ◽

2020 ◽

Vol 9 (3) ◽

pp. 537 ◽

Cited By ~ 5

Author(s):

Liquan Zhao ◽

Shuaiyang Li

Keyword(s):

Markov Chains ◽

Object Detection ◽

Large Scale ◽

Ground Truth ◽

Detection Algorithm ◽

Detection Methods ◽

Cluster Method ◽

Cluster Center ◽

Initial Cluster ◽

Bounding Boxes

The ‘You Only Look Once’ v3 (YOLOv3) method is among the most widely used deep learning-based object detection methods. It uses the k-means cluster method to estimate the initial width and height of the predicted bounding boxes. With this method, the estimated width and height are sensitive to the initial cluster centers, and the processing of large-scale datasets is time-consuming. In order to address these problems, a new cluster method for estimating the initial width and height of the predicted bounding boxes has been developed. Firstly, it randomly selects a couple of width and height values as one initial cluster center separate from the width and height of the ground truth boxes. Secondly, it constructs Markov chains based on the selected initial cluster and uses the final points of every Markov chain as the other initial centers. In the construction of Markov chains, the intersection-over-union method is used to compute the distance between the selected initial clusters and each candidate point, instead of the square root method. Finally, this method can be used to continually update the cluster center with each new set of width and height values, which are only a part of the data selected from the datasets. Our simulation results show that the new method has faster convergence speed for initializing the width and height of the predicted bounding boxes and that it can select more representative initial widths and heights of the predicted bounding boxes. Our proposed method achieves better performance than the YOLOv3 method in terms of recall, mean average precision, and F1-score.

Download Full-text

Application of Image Processing and Big Data Science for Flood Label Detection

10.5194/egusphere-egu21-3709 ◽

2021 ◽

Author(s):

Jaku Rabinder Rakshit Pally ◽

Vidya Samadi

Keyword(s):

Image Processing ◽

Big Data ◽

Object Detection ◽

Autonomous Vehicles ◽

Large Scale ◽

Data Science ◽

Image Annotation ◽

Smart Phone ◽

Identity Verification ◽

Bounding Boxes

<p>Due to the importance of object detection in video analysis and image annotation, it is widely utilized in a number of computer vision tasks such as face recognition, autonomous vehicles, activity recognition, tracking objects and identity verification. Object detection does not only involve classification and identification of objects within images, but also involves localizing and tracing the objects by creating bounding boxes around the objects and labelling them with their respective prediction scores. Here, we leverage and discuss how connected vision systems can be used to embed cameras, image processing, Edge Artificial Intelligence (AI), and data connectivity capabilities for flood label detection. We favored the engineering definition of label detection that a label is a sequence of discrete measurable observations obtained using a capturing device such as web cameras, smart phone, etc. We built a Big Data service of around 1000 images (image annotation service) including the image geolocation information from various flooding events in the Carolinas (USA) with a total of eight different object categories. Our developed platform has several smart AI tools and task configurations that can detect objects&#8217; edges or contours which can be manually adjusted with a threshold setting so as to best segment the image. The tool has the ability to train the dataset and predict the labels for large scale datasets which can be used as an object detector to drastically reduce the amount of time spent per object particularly for real-time image-based flood forecasting. &#160;This research is funded by the US National Science Foundation (NSF).</p>

Download Full-text

Towards Fine-grained Recognition: Joint Learning for Object Detection and Fine-grained Classification

SSRN Electronic Journal ◽

10.2139/ssrn.3499468 ◽

2019 ◽

Author(s):

Qiaosong Wang ◽

Christopher Rasmussen

Keyword(s):

Object Detection ◽

Joint Learning ◽

Fine Grained

Download Full-text

Strategic resources and smallholder performance at the bottom of the pyramid

International Food and Agribusiness Management Review ◽

10.22434/ifamr2018.0111 ◽

2019 ◽

Vol 22 (3) ◽

pp. 365-380 ◽

Cited By ~ 1

Author(s):

Matthias Olthaar ◽

Wilfred Dolfsma ◽

Clemens Lutz ◽

Florian Noseleit

Keyword(s):

Large Scale ◽

Business Environment ◽

Agricultural Economics ◽

Local Market ◽

Bottom Of The Pyramid ◽

Fine Grained ◽

Strategic Resources ◽

Relative Contribution ◽

The Relationship ◽

Resource Based Theory

In a competitive business environment at the Bottom of the Pyramid smallholders supplying global value chains may be thought to be at the whims of downstream large-scale players and local market forces, leaving no room for strategic entrepreneurial behavior. In such a context we test the relationship between the use of strategic resources and firm performance. We adopt the Resource Based Theory and show that seemingly homogenous smallholders deploy resources differently and, consequently, some do outperform others. We argue that the ‘resource-based theory’ results in a more fine-grained understanding of smallholder performance than approaches generally applied in agricultural economics. We develop a mixed-method approach that allows one to pinpoint relevant, industry-specific resources, and allows for empirical identification of the relative contribution of each resource to competitive advantage. The results show that proper use of quality labor, storage facilities, time of selling, and availability of animals are key capabilities.

Download Full-text

Inverse Infiltration Modeling of Dike Covers Made of Dredged Material Using PEST and AMALGAM

Geosciences ◽

10.3390/geosciences11020041 ◽

2021 ◽

Vol 11 (2) ◽

pp. 41

Author(s):

Tim Jurisch ◽

Stefan Cantré ◽

Fokke Saathoff

Keyword(s):

Large Scale ◽

Measurement Data ◽

Fine Grained ◽

Dredged Materials ◽

Measurement Results ◽

Large Scale Field ◽

Eigenvalue Ratio ◽

Ill Posed ◽

Materials Used ◽

Installation Technology

A variety of studies recently proved the applicability of different dried, fine-grained dredged materials as replacement material for erosion-resistant sea dike covers. In Rostock, Germany, a large-scale field experiment was conducted, in which different dredged materials were tested with regard to installation technology, stability, turf development, infiltration, and erosion resistance. The infiltration experiments to study the development of a seepage line in the dike body showed unexpected measurement results. Due to the high complexity of the problem, standard geo-hydraulic models proved to be unable to analyze these results. Therefore, different methods of inverse infiltration modeling were applied, such as the parameter estimation tool (PEST) and the AMALGAM algorithm. In the paper, the two approaches are compared and discussed. A sensitivity analysis proved the presumption of a non-linear model behavior for the infiltration problem and the Eigenvalue ratio indicates that the dike infiltration is an ill-posed problem. Although this complicates the inverse modeling (e.g., termination in local minima), parameter sets close to an optimum were found with both the PEST and the AMALGAM algorithms. Together with the field measurement data, this information supports the rating of the effective material properties of the applied dredged materials used as dike cover material.

Download Full-text

ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

International Journal of Computer Vision ◽

10.1007/s11263-021-01477-5 ◽

2021 ◽

Author(s):

Anil S. Baslamisli ◽

Partha Das ◽

Hoang-An Le ◽

Sezer Karaoglu ◽

Theo Gevers

Keyword(s):

Neural Network ◽

Large Scale ◽

State Of The Art ◽

Image Decomposition ◽

Natural Environments ◽

Decomposition Algorithms ◽

Ambient Light ◽

Fine Grained ◽

Large Scale Dataset ◽

Direct Illumination

AbstractIn general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than reflectance (albedo) changes, these methods may fail in distinguishing strong photometric effects from reflectance variations. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows) subcomponents. The aim is to distinguish strong photometric effects from reflectance variations. An end-to-end deep convolutional neural network (ShadingNet) is proposed that operates in a fine-to-coarse manner with a specialized fusion and refinement unit exploiting the fine-grained shading model. It is designed to learn specific reflectance cues separated from specific photometric effects to analyze the disentanglement capability. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with fine-grained intrinsic image ground-truths. Large scale experiments show that our approach using fine-grained shading decompositions outperforms state-of-the-art algorithms utilizing unified shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD datasets.

Download Full-text

A Set of Single YOLO Modalities to Detect Occluded Entities via Viewpoint Conversion

Applied Sciences ◽

10.3390/app11136016 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6016

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Object Detection ◽

Autonomous Vehicles ◽

Autonomous Driving ◽

Detection Algorithm ◽

Detection Accuracy ◽

Cloud Data ◽

Detection Techniques ◽

Bounding Boxes ◽

Partially Occluded ◽

Rgb Image

For autonomous vehicles, it is critical to be aware of the driving environment to avoid collisions and drive safely. The recent evolution of convolutional neural networks has contributed significantly to accelerating the development of object detection techniques that enable autonomous vehicles to handle rapid changes in various driving environments. However, collisions in an autonomous driving environment can still occur due to undetected obstacles and various perception problems, particularly occlusion. Thus, we propose a robust object detection algorithm for environments in which objects are truncated or occluded by employing RGB image and light detection and ranging (LiDAR) bird’s eye view (BEV) representations. This structure combines independent detection results obtained in parallel through “you only look once” networks using an RGB image and a height map converted from the BEV representations of LiDAR’s point cloud data (PCD). The region proposal of an object is determined via non-maximum suppression, which suppresses the bounding boxes of adjacent regions. A performance evaluation of the proposed scheme was performed using the KITTI vision benchmark suite dataset. The results demonstrate the detection accuracy in the case of integration of PCD BEV representations is superior to when only an RGB camera is used. In addition, robustness is improved by significantly enhancing detection accuracy even when the target objects are partially occluded when viewed from the front, which demonstrates that the proposed algorithm outperforms the conventional RGB-based model.

Download Full-text