scholarly journals Benchmark Dataset Based on Category Maps with Indoor–Outdoor Mixed Features for Positional Scene Recognition by a Mobile Robot

Robotics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 40
Author(s):  
Hirokazu Madokoro ◽  
Hanwool Woo ◽  
Stephanie Nix ◽  
Kazuhito Sato

This study was conducted to develop original benchmark datasets that simultaneously include indoor–outdoor visual features. Indoor visual information related to images includes outdoor features to a degree that varies extremely by time, weather, and season. We obtained time-series scene images using a wide field of view (FOV) camera mounted on a mobile robot moving along a 392-m route in an indoor environment surrounded by transparent glass walls and windows for two directions in three seasons. For this study, we propose a unified method for extracting, characterizing, and recognizing visual landmarks that are robust to human occlusion in a real environment in which robots coexist with people. Using our method, we conducted an evaluation experiment to recognize scenes divided up to 64 zones with fixed intervals. The experimentally obtained results using the datasets revealed the performance and characteristics of meta-parameter optimization, mapping characteristics to category maps, and recognition accuracy. Moreover, we visualized similarities between scene images using category maps. We also identified cluster boundaries obtained from mapping weights.

Author(s):  
Yiyi Zhou ◽  
Rongrong Ji ◽  
Jinsong Su ◽  
Xiangming Li ◽  
Xiaoshuai Sun

In this paper, we uncover the issue of knowledge inertia in visual question answering (VQA), which commonly exists in most VQA models and forces the models to mainly rely on the question content to “guess” answer, without regard to the visual information. Such an issue not only impairs the performance of VQA models, but also greatly reduces the credibility of the answer prediction. To this end, simply highlighting the visual features in the model is undoable, since the prediction is built upon the joint modeling of two modalities and largely influenced by the data distribution. In this paper, we propose a Pairwise Inconformity Learning (PIL) to tackle the issue of knowledge inertia. In particular, PIL takes full advantage of the similar image pairs with diverse answers to an identical question provided in VQA2.0 dataset. It builds a multi-modal embedding space to project pos./neg. feature pairs, upon which word vectors of answers are modeled as anchors. By doing so, PIL strengthens the importance of visual features in prediction with a novel dynamic-margin based triplet loss that efficiently increases the semantic discrepancies between pos./neg. image pairs. To verify the proposed PIL, we plug it on a baseline VQA model as well as a set of recent VQA models, and conduct extensive experiments on two benchmark datasets, i.e., VQA1.0 and VQA2.0. Experimental results show that PIL can boost the accuracy of the existing VQA models (1.56%-2.93% gain) with a negligible increase in parameters (0.85%-5.4% parameters). Qualitative results also reveal the elimination of knowledge inertia in the existing VQA models after implementing our PIL.


1989 ◽  
Vol 33 (2) ◽  
pp. 86-90 ◽  
Author(s):  
Loran A. Haworth ◽  
Nancy Bucher ◽  
David Runnings

Simulation scientists continually pursue improved flight simulation technology with the goal of closely replicating the “real world” physical environment. The presentation/display of visual information for flight simulation is one such area enjoying recent technical improvements that are fundamental for conducting simulated operations close to the terrain. Detailed and appropriate visual information is especially critical for Nap-Of-the-Earth (NOE) helicopter flight simulation where the pilot maintains an “eyes-out” orientation to avoid obstructions and terrain. This paper elaborates on the visually-coupled Wide Field Of View Helmet Mounted Display (WFOVHMD) system technology as a viable visual display system for helicopter simulation. In addition the paper discusses research conducted on the NASA-Ames Vertical Motion Simulator that examined one critical research issue for helmet mounted displays.


2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Rodrigo Munguía ◽  
Carlos López-Franco ◽  
Emmanuel Nuño ◽  
Adriana López-Franco

This work presents a method for implementing a visual-based simultaneous localization and mapping (SLAM) system using omnidirectional vision data, with application to autonomous mobile robots. In SLAM, a mobile robot operates in an unknown environment using only on-board sensors to simultaneously build a map of its surroundings, which it uses to track its position. The SLAM is perhaps one of the most fundamental problems to solve in robotics to build mobile robots truly autonomous. The visual sensor used in this work is an omnidirectional vision sensor; this sensor provides a wide field of view which is advantageous in a mobile robot in an autonomous navigation task. Since the visual sensor used in this work is monocular, a method to recover the depth of the features is required. To estimate the unknown depth we propose a novel stochastic triangulation technique. The system proposed in this work can be applied to indoor or cluttered environments for performing visual-based navigation when GPS signal is not available. Experiments with synthetic and real data are presented in order to validate the proposal.


Author(s):  
Masakazu Iwamura ◽  
Yoshihiko Inoue ◽  
Kazunori Minatani ◽  
Koichi Kise

AbstractFor people with visual impairment, smartphone apps that use computer vision techniques to provide visual information have played important roles in supporting their daily lives. However, they can be used under a specific condition only. That is, only when the user knows where the object of interest is. In this paper, we first point out the fact mentioned above by categorizing the tasks that obtain visual information using computer vision techniques. Then, in looking for something as a representative task in a category, we argue suitable camera systems and rotation navigation methods. In the latter, we propose novel voice navigation methods. As a result of a user study comprised of seven people with visual impairment, we found that (1) a camera with a wide field of view such as an omnidirectional camera was preferred, and (2) users have different preferences in navigation methods.


Author(s):  
Xuelong Li ◽  
Bin Zhao ◽  
Xiaoqiang Lu

Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is proposed, where MAM is utilized to encode the visual feature and RNN works as the decoder to generate the video caption. During generation, the proposed approach is able to adaptively attend to the salient regions in the frame and the frames correlated to the caption. Practically, the experimental results on two benchmark datasets, i.e., MSVD and Charades, have shown the excellent performance of the proposed approach.


Author(s):  
M. G. Lagally

It has been recognized since the earliest days of crystal growth that kinetic processes of all Kinds control the nature of the growth. As the technology of crystal growth has become ever more refined, with the advent of such atomistic processes as molecular beam epitaxy, chemical vapor deposition, sputter deposition, and plasma enhanced techniques for the creation of “crystals” as little as one or a few atomic layers thick, multilayer structures, and novel materials combinations, the need to understand the mechanisms controlling the growth process is becoming more critical. Unfortunately, available techniques have not lent themselves well to obtaining a truly microscopic picture of such processes. Because of its atomic resolution on the one hand, and the achievable wide field of view on the other (of the order of micrometers) scanning tunneling microscopy (STM) gives us this opportunity. In this talk, we briefly review the types of growth kinetics measurements that can be made using STM. The use of STM for studies of kinetics is one of the more recent applications of what is itself still a very young field.


2020 ◽  
Vol 13 (6) ◽  
pp. 1-9
Author(s):  
XU Hong-gang ◽  
◽  
HAN Bing ◽  
LI Man-li ◽  
MA Hong-tao ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2203
Author(s):  
Antal Hiba ◽  
Attila Gáti ◽  
Augustin Manecy

Precise navigation is often performed by sensor fusion of different sensors. Among these sensors, optical sensors use image features to obtain the position and attitude of the camera. Runway relative navigation during final approach is a special case where robust and continuous detection of the runway is required. This paper presents a robust threshold marker detection method for monocular cameras and introduces an on-board real-time implementation with flight test results. Results with narrow and wide field-of-view optics are compared. The image processing approach is also evaluated on image data captured by a different on-board system. The pure optical approach of this paper increases sensor redundancy because it does not require input from an inertial sensor as most of the robust runway detectors.


2012 ◽  
Vol 100 (13) ◽  
pp. 133701 ◽  
Author(s):  
Hewei Liu ◽  
Feng Chen ◽  
Qing Yang ◽  
Pubo Qu ◽  
Shengguan He ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document