scholarly journals Spherical Criteria for Fast and Accurate 360° Object Detection

2020 ◽  
Vol 34 (07) ◽  
pp. 12959-12966
Author(s):  
Pengyu Zhao ◽  
Ansheng You ◽  
Yuanxing Zhang ◽  
Jiaying Liu ◽  
Kaigui Bian ◽  
...  

With the advance of omnidirectional panoramic technology, 360◦ imagery has become increasingly popular in the past few years. To better understand the 360◦ content, many works resort to the 360◦ object detection and various criteria have been proposed to bound the objects and compute the intersection-over-union (IoU) between bounding boxes based on the common equirectangular projection (ERP) or perspective projection (PSP). However, the existing 360◦ criteria are either inaccurate or inefficient for real-world scenarios. In this paper, we introduce a novel spherical criteria for fast and accurate 360◦ object detection, including both spherical bounding boxes and spherical IoU (SphIoU). Based on the spherical criteria, we propose a novel two-stage 360◦ detector, i.e., Reprojection R-CNN, by combining the advantages of both ERP and PSP, yielding efficient and accurate 360◦ object detection. To validate the design of spherical criteria and Reprojection R-CNN, we construct two unbiased synthetic datasets for training and evaluation. Experimental results reveal that compared with the existing criteria, the two-stage detector with spherical criteria achieves the best mAP results under the same inference speed, demonstrating that the spherical criteria can be more suitable for 360◦ object detection. Moreover, Reprojection R-CNN outperforms the previous state-of-the-art methods by over 30% on mAP with competitive speed, which confirms the efficiency and accuracy of the design.

2021 ◽  
Vol 12 (1) ◽  
pp. 31-39
Author(s):  
D. D. Rukhovich ◽  

Deep learning-based detectors usually produce a redundant set of object bounding boxes including many duplicate detections of the same object. These boxes are then filtered using non-maximum suppression (NMS) in order to select exactly one bounding box per object of interest. This greedy scheme is simple and provides sufficient accuracy for isolated objects but often fails in crowded environments, since one needs to both preserve boxes for different objects and suppress duplicate detections. In this work we develop an alternative iterative scheme, where a new subset of objects is detected at each iteration. Detected boxes from the previous iterations are passed to the network at the following iterations to ensure that the same object would not be detected twice. This iterative scheme can be applied to both one-stage and two-stage object detectors with just minor modifications of the training and inference proce­dures. We perform extensive experiments with two different baseline detectors on four datasets and show significant improvement over the baseline, leading to state-of-the-art performance on CrowdHuman and WiderPerson datasets.


Author(s):  
Yunhong Gong ◽  
Yanan Sun ◽  
Dezhong Peng ◽  
Peng Chen ◽  
Zhongtai Yan ◽  
...  

AbstractThe COVID-19 pandemic has caused a global alarm. With the advances in artificial intelligence, the COVID-19 testing capabilities have been greatly expanded, and hospital resources are significantly alleviated. Over the past years, computer vision researches have focused on convolutional neural networks (CNNs), which can significantly improve image analysis ability. However, CNN architectures are usually manually designed with rich expertise that is scarce in practice. Evolutionary algorithms (EAs) can automatically search for the proper CNN architectures and voluntarily optimize the related hyperparameters. The networks searched by EAs can be used to effectively process COVID-19 computed tomography images without expert knowledge and manual setup. In this paper, we propose a novel EA-based algorithm with a dynamic searching space to design the optimal CNN architectures for diagnosing COVID-19 before the pathogenic test. The experiments are performed on the COVID-CT data set against a series of state-of-the-art CNN models. The experiments demonstrate that the architecture searched by the proposed EA-based algorithm achieves the best performance yet without any preprocessing operations. Furthermore, we found through experimentation that the intensive use of batch normalization may deteriorate the performance. This contrasts with the common sense approach of manually designing CNN architectures and will help the related experts in handcrafting CNN models to achieve the best performance without any preprocessing operations


2020 ◽  
Vol 68 ◽  
pp. 311-364
Author(s):  
Francesco Trovo ◽  
Stefano Paladino ◽  
Marcello Restelli ◽  
Nicola Gatti

Multi-Armed Bandit (MAB) techniques have been successfully applied to many classes of sequential decision problems in the past decades. However, non-stationary settings -- very common in real-world applications -- received little attention so far, and theoretical guarantees on the regret are known only for some frequentist algorithms. In this paper, we propose an algorithm, namely Sliding-Window Thompson Sampling (SW-TS), for nonstationary stochastic MAB settings. Our algorithm is based on Thompson Sampling and exploits a sliding-window approach to tackle, in a unified fashion, two different forms of non-stationarity studied separately so far: abruptly changing and smoothly changing. In the former, the reward distributions are constant during sequences of rounds, and their change may be arbitrary and happen at unknown rounds, while, in the latter, the reward distributions smoothly evolve over rounds according to unknown dynamics. Under mild assumptions, we provide regret upper bounds on the dynamic pseudo-regret of SW-TS for the abruptly changing environment, for the smoothly changing one, and for the setting in which both the non-stationarity forms are present. Furthermore, we empirically show that SW-TS dramatically outperforms state-of-the-art algorithms even when the forms of non-stationarity are taken separately, as previously studied in the literature.


Author(s):  
Devin Pierce ◽  
Shulan Lu ◽  
Derek Harter

The past decade has witnessed incredible advances in building highly realistic and richly detailed simulated worlds. We readily endorse the common-sense assumption that people will be better equipped for solving real-world problems if they are trained in near-life, even if virtual, scenarios. The past decade has also witnessed a significant increase in our knowledge of how the human body as both sensor and as effector relates to cognition. Evidence shows that our mental representations of the world are constrained by the bodily states present in our moment-to-moment interactions with the world. The current study investigated whether there are differences in how people enact actions in the simulated as opposed to the real world. The current study developed simple parallel task environments and asked participants to perform actions embedded in a stream of continuous events (e.g., cutting a cucumber). The results showed that participants performed actions at a faster speed and came closer to incurring injury to the fingers in the avatar enacting action environment than in the human enacting action environment.


2020 ◽  
Vol 34 (04) ◽  
pp. 5867-5874
Author(s):  
Gan Sun ◽  
Yang Cong ◽  
Qianqian Wang ◽  
Jun Li ◽  
Yun Fu

In the past decades, spectral clustering (SC) has become one of the most effective clustering algorithms. However, most previous studies focus on spectral clustering tasks with a fixed task set, which cannot incorporate with a new spectral clustering task without accessing to previously learned tasks. In this paper, we aim to explore the problem of spectral clustering in a lifelong machine learning framework, i.e., Lifelong Spectral Clustering (L2SC). Its goal is to efficiently learn a model for a new spectral clustering task by selectively transferring previously accumulated experience from knowledge library. Specifically, the knowledge library of L2SC contains two components: 1) orthogonal basis library: capturing latent cluster centers among the clusters in each pair of tasks; 2) feature embedding library: embedding the feature manifold information shared among multiple related tasks. As a new spectral clustering task arrives, L2SC firstly transfers knowledge from both basis library and feature library to obtain encoding matrix, and further redefines the library base over time to maximize performance across all the clustering tasks. Meanwhile, a general online update formulation is derived to alternatively update the basis library and feature library. Finally, the empirical experiments on several real-world benchmark datasets demonstrate that our L2SC model can effectively improve the clustering performance when comparing with other state-of-the-art spectral clustering algorithms.


Buildings ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 464
Author(s):  
Dan Guo ◽  
Ming Shan ◽  
Emmanuel Kingsford Owusu

During the past two decades, critical infrastructures (CIs) faced a growing number of challenges worldwide due to natural disasters and other disruptive events. To respond to and handle these disasters and disruptive events, the concept of resilience was introduced to CIs. Particularly, many institutions and scholars developed various types of frameworks to assess and enhance CI resilience. The purpose of this paper is to review the resilience assessment frameworks of the CIs proposed by quality papers published in the past decade, determine and analyze the common dimensions and the key indicators of resilience assessment frameworks of CIs, and propose possible opportunities for future research. To achieve these goals, a comprehensive literature review was conducted, which identified 24 resilience assessment frameworks from 24 quality papers. This paper contributes to the current body of resilience research by identifying the common dimensions and the key indicators of the resilience assessment frameworks proposed for CIs. In addition, this paper is beneficial to the practice, because it provides a comprehensive view of the resilience assessment frameworks of CIs from the perspective of implementation, and the indicators are pragmatic and actionable in practice.


2020 ◽  
Vol 11 ◽  
Author(s):  
Hao Lu ◽  
Zhiguo Cao

Plant counting runs through almost every stage of agricultural production from seed breeding, germination, cultivation, fertilization, pollination to yield estimation, and harvesting. With the prevalence of digital cameras, graphics processing units and deep learning-based computer vision technology, plant counting has gradually shifted from traditional manual observation to vision-based automated solutions. One of popular solutions is a state-of-the-art object detection technique called Faster R-CNN where plant counts can be estimated from the number of bounding boxes detected. It has become a standard configuration for many plant counting systems in plant phenotyping. Faster R-CNN, however, is expensive in computation, particularly when dealing with high-resolution images. Unfortunately high-resolution imagery is frequently used in modern plant phenotyping platforms such as unmanned aerial vehicles, engendering inefficient image analysis. Such inefficiency largely limits the throughput of a phenotyping system. The goal of this work hence is to provide an effective and efficient tool for high-throughput plant counting from high-resolution RGB imagery. In contrast to conventional object detection, we encourage another promising paradigm termed object counting where plant counts are directly regressed from images, without detecting bounding boxes. In this work, by profiling the computational bottleneck, we implement a fast version of a state-of-the-art plant counting model TasselNetV2 with several minor yet effective modifications. We also provide insights why these modifications make sense. This fast version, TasselNetV2+, runs an order of magnitude faster than TasselNetV2, achieving around 30 fps on image resolution of 1980 × 1080, while it still retains the same level of counting accuracy. We validate its effectiveness on three plant counting tasks, including wheat ears counting, maize tassels counting, and sorghum heads counting. To encourage the use of this tool, our implementation has been made available online at https://tinyurl.com/TasselNetV2plus.


Author(s):  
Jie Wen ◽  
Zheng Zhang ◽  
Yong Xu ◽  
Bob Zhang ◽  
Lunke Fei ◽  
...  

Multi-view clustering aims to partition data collected from diverse sources based on the assumption that all views are complete. However, such prior assumption is hardly satisfied in many real-world applications, resulting in the incomplete multi-view learning problem. The existing attempts on this problem still have the following limitations: 1) the underlying semantic information of the missing views is commonly ignored; 2) The local structure of data is not well explored; 3) The importance of different views is not effectively evaluated. To address these issues, this paper proposes a Unified Embedding Alignment Framework (UEAF) for robust incomplete multi-view clustering. In particular, a locality-preserved reconstruction term is introduced to infer the missing views such that all views can be naturally aligned. A consensus graph is adaptively learned and embedded via the reverse graph regularization to guarantee the common local structure of multiple views and in turn can further align the incomplete views and inferred views. Moreover, an adaptive weighting strategy is designed to capture the importance of different views. Extensive experimental results show that the proposed method can significantly improve the clustering performance in comparison with some state-of-the-art methods.


Author(s):  
Enze Xie ◽  
Yuhang Zang ◽  
Shuai Shao ◽  
Gang Yu ◽  
Cong Yao ◽  
...  

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on


2020 ◽  
Author(s):  
Mikel Joaristi

Unsupervised Graph Representation Learning methods learn a numerical representation of the nodes in a graph. The generated representations encode meaningful information about the nodes' properties, making them a powerful tool for tasks in many areas of study, such as social sciences, biology or communication networks. These methods are particularly interesting because they facilitate the direct use of standard Machine Learning models on graphs. Graph representation learning methods can be divided into two main categories depending on the information they encode, methods preserving the nodes connectivity information, and methods preserving nodes' structural information. Connectivity-based methods focus on encoding relationships between nodes, with neighboring nodes being closer together in the resulting latent space. On the other hand, structure-based methods generate a latent space where nodes serving a similar structural function in the network are encoded close to each other, independently of them being connected or even close to each other in the graph. While there are a lot of works that focus on preserving nodes' connectivity information, only a few works study the problem of encoding nodes' structure, specially in an unsupervised way. In this dissertation, we demonstrate that properly encoding nodes' structural information is fundamental for many real-world applications, as it can be leveraged to successfully solve many tasks where connectivity-based methods fail. One concrete example is presented first. In this example, the task consists of detecting malicious entities in a real-world financial network. We show that to solve this problem, connectivity information is not enough and show how leveraging structural information provides considerable performance improvements. This particular example pinpoints the need for further research on the area of structural graph representation learning, together with the limitations of the previous state-of-the-art. We use the acquired knowledge as a starting point and inspiration for the research and development of three independent unsupervised structural graph representation learning methods: Structural Iterative Representation learning approach for Graph Nodes (SIR-GN), Structural Iterative Lexicographic Autoencoded Node Representation (SILA), and Sparse Structural Node Representation (SparseStruct). We show how each of our methods tackles specific limitations on the previous state-of-the-art on structural graph representation learning such as scalability, representation meaning, and lack of formal proof that guarantees the preservation of structural properties. We provide an extensive experimental section where we compare our three proposed methods to the current state-of-the-art on both connectivity-based and structure-based representation learning methods. Finally, in this dissertation, we look at extensions of the basic structural graph representation learning problem. We study the problem of temporal structural graph representation. We also provide a method for representation explainability.


Sign in / Sign up

Export Citation Format

Share Document