scholarly journals Diagnosing state-of-the-art object proposal methods

Author(s):  
Hongyuan Zhu ◽  
Shijian Lu ◽  
Jianfei Cai ◽  
Guangqing Lee
2021 ◽  
Author(s):  
Da-Ren Chen ◽  
Wei-Min Chiu

Abstract Machine learning techniques have been used to increase detection accuracy of cracks in road surfaces. Most studies failed to consider variable illumination conditions on the target of interest (ToI), and only focus on detecting the presence or absence of road cracks. This paper proposes a new road crack detection method, IlumiCrack, which integrates Gaussian mixture models (GMM) and object detection CNN models. This work provides the following contributions: 1) For the first time, a large-scale road crack image dataset with a range of illumination conditions (e.g., day and night) is prepared using a dashcam. 2) Based on GMM, experimental evaluations on 2 to 4 levels of brightness are conducted for optimal classification. 3) the IlumiCrack framework is used to integrate state-of-the-art object detecting methods with CNN to classify the road crack images into eight types with high accuracy. Experimental results show that IlumiCrack outperforms the state-of-the-art R-CNN object detection frameworks.


2020 ◽  
Vol 11 ◽  
Author(s):  
Hao Lu ◽  
Zhiguo Cao

Plant counting runs through almost every stage of agricultural production from seed breeding, germination, cultivation, fertilization, pollination to yield estimation, and harvesting. With the prevalence of digital cameras, graphics processing units and deep learning-based computer vision technology, plant counting has gradually shifted from traditional manual observation to vision-based automated solutions. One of popular solutions is a state-of-the-art object detection technique called Faster R-CNN where plant counts can be estimated from the number of bounding boxes detected. It has become a standard configuration for many plant counting systems in plant phenotyping. Faster R-CNN, however, is expensive in computation, particularly when dealing with high-resolution images. Unfortunately high-resolution imagery is frequently used in modern plant phenotyping platforms such as unmanned aerial vehicles, engendering inefficient image analysis. Such inefficiency largely limits the throughput of a phenotyping system. The goal of this work hence is to provide an effective and efficient tool for high-throughput plant counting from high-resolution RGB imagery. In contrast to conventional object detection, we encourage another promising paradigm termed object counting where plant counts are directly regressed from images, without detecting bounding boxes. In this work, by profiling the computational bottleneck, we implement a fast version of a state-of-the-art plant counting model TasselNetV2 with several minor yet effective modifications. We also provide insights why these modifications make sense. This fast version, TasselNetV2+, runs an order of magnitude faster than TasselNetV2, achieving around 30 fps on image resolution of 1980 × 1080, while it still retains the same level of counting accuracy. We validate its effectiveness on three plant counting tasks, including wheat ears counting, maize tassels counting, and sorghum heads counting. To encourage the use of this tool, our implementation has been made available online at https://tinyurl.com/TasselNetV2plus.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4668
Author(s):  
Gregor Koporec ◽  
Andrej Košir ◽  
Aleš Leonardis ◽  
Janez Perš

This work examines the differences between a human and a machine in object recognition tasks. The machine is useful as much as the output classification labels are correct and match the dataset-provided labels. However, very often a discrepancy occurs because the dataset label is different than the one expected by a human. To correct this, the concept of the target user population is introduced. The paper presents a complete methodology for either adapting the output of a pre-trained, state-of-the-art object classification algorithm to the target population or inferring a proper, user-friendly categorization from the target population. The process is called ‘user population re-targeting’. The methodology includes a set of specially designed population tests, which provide crucial data about the categorization that the target population prefers. The transformation between the dataset-bound categorization and the new, population-specific categorization is called the ‘Cognitive Relevance Transform’. The results of the experiments on the well-known datasets have shown that the target population preferred such a transformed categorization by a large margin, that the performance of human observers is probably better than previously thought, and that the outcome of re-targeting may be difficult to predict without actual tests on the target population.


Author(s):  
Hui Ying ◽  
Zhaojin Huang ◽  
Shu Liu ◽  
Tianjia Shao ◽  
Kun Zhou

Current instance segmentation methods can be categorized into segmentation-based methods and proposal-based methods. The former performs segmentation first and then does clustering, while the latter detects objects first and then predicts the mask for each object proposal. In this work, we propose a single-stage method, named EmbedMask, that unifies both methods by taking their advantages, so it can achieve good performance in instance segmentation and produce high-resolution masks in a high speed. EmbedMask introduces two newly defined embeddings for mask prediction, which are pixel embedding and proposal embedding. During training, we enforce the pixel embedding to be close to its coupled proposal embedding if they belong to the same instance. During inference, pixels are assigned to the mask of the proposal if their embeddings are similar. This mechanism brings several benefits. First, the pixel-level clustering enables EmbedMask to generate high-resolution masks and avoids the complicated two-stage mask prediction. Second, the existence of proposal embedding simplifies and strengthens the clustering procedure, so our method can achieve high speed and better performance than segmentation-based methods. Without any bell or whistle, EmbedMask outperforms the state-of-the-art instance segmentation method Mask R-CNN on the challenging COCO dataset, obtaining more detailed masks at a higher speed.


Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 93 ◽  
Author(s):  
Tan Zhang ◽  
Zhenhai Huang ◽  
Weijie You ◽  
Jiatao Lin ◽  
Xiaolong Tang ◽  
...  

Reliable and robust systems to detect and harvest fruits and vegetables in unstructured environments are crucial for harvesting robots. In this paper, we propose an autonomous system that harvests most types of crops with peduncles. A geometric approach is first applied to obtain the cutting points of the peduncle based on the fruit bounding box, for which we have adapted the model of the state-of-the-art object detector named Mask Region-based Convolutional Neural Network (Mask R-CNN). We designed a novel gripper that simultaneously clamps and cuts the peduncles of crops without contacting the flesh. We have conducted experiments with a robotic manipulator to evaluate the effectiveness of the proposed harvesting system in being able to efficiently harvest most crops in real laboratory environments.


2020 ◽  
Vol 34 (10) ◽  
pp. 13789-13790 ◽  
Author(s):  
Anurag Garg ◽  
Niket Tandon ◽  
Aparna S. Varde

Can we automatically predict failures of an object detection model on images from a target domain? We characterize errors of a state-of-the-art object detection model on the currently popular smart mobility domain, and find that a large number of errors can be identified using spatial commonsense. We propose øurmodel , a system that automatically identifies a large number of such errors based on commonsense knowledge. Our system does not require any new annotations and can still find object detection errors with high accuracy (more than 80% when measured by humans). This work lays the foundation to answer exciting research questions on domain adaptation including the ability to automatically create adversarial datasets for target domain.


Sign in / Sign up

Export Citation Format

Share Document