Object detection and mapping for service robot tasks

SUMMARYThe problem studied in this paper is a mobile robot that autonomously navigates in a domestic environment, builds a map as it moves along and localizes its position in it. In addition, the robot detects predefined objects, estimates their position in the environment and integrates this with the localization module to automatically put the objects in the generated map. Thus, we demonstrate one of the possible strategies for the integration of spatial and semantic knowledge in a service robot scenario where a simultaneous localization and mapping (SLAM) and object detection recognition system work in synergy to provide a richer representation of the environment than it would be possible with either of the methods alone. Most SLAM systems build maps that are only used for localizing the robot. Such maps are typically based on grids or different types of features such as point and lines. The novelty is the augmentation of this process with an object-recognition system that detects objects in the environment and puts them in the map generated by the SLAM system. The metric map is also split into topological entities corresponding to rooms. In this way, the user can command the robot to retrieve a certain object from a certain room. We present the results of map building and an extensive evaluation of the object detection algorithm performed in an indoor setting.

Download Full-text

A dual U-Net algorithm for automating feature extraction from satellite imagery

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/1548512920983549 ◽

2021 ◽

pp. 154851292098354

Author(s):

Samuel Humphries ◽

Trevor Parker ◽

Bryan Jonas ◽

Bryan Adams ◽

Nicholas J Clark

Keyword(s):

Neural Networks ◽

Object Detection ◽

Satellite Images ◽

Detection Algorithm ◽

Military Operations ◽

Detection Algorithms ◽

Us Military ◽

Standard Product ◽

Different Types ◽

Road Intersections

Quick identification of building and roads is critical for execution of tactical US military operations in an urban environment. To this end, a gridded, referenced, satellite images of an objective, often referred to as a gridded reference graphic or GRG, has become a standard product developed during intelligence preparation of the environment. At present, operational units identify key infrastructure by hand through the work of individual intelligence officers. Recent advances in Convolutional Neural Networks, however, allows for this process to be streamlined through the use of object detection algorithms. In this paper, we describe an object detection algorithm designed to quickly identify and label both buildings and road intersections present in an image. Our work leverages both the U-Net architecture as well the SpaceNet data corpus to produce an algorithm that accurately identifies a large breadth of buildings and different types of roads. In addition to predicting buildings and roads, our model numerically labels each building by means of a contour finding algorithm. Most importantly, the dual U-Net model is capable of predicting buildings and roads on a diverse set of test images and using these predictions to produce clean GRGs.

Download Full-text

A Simultaneous Localization and Mapping (SLAM) Framework for 2.5D Map Building Based on Low-Cost LiDAR and Vision Fusion

Applied Sciences ◽

10.3390/app9102105 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2105 ◽

Cited By ~ 9

Author(s):

Guolai Jiang ◽

Lei Yin ◽

Shaokun Jin ◽

Chaoran Tian ◽

Xinbo Ma ◽

...

Keyword(s):

Low Cost ◽

Simultaneous Localization And Mapping ◽

Image Data ◽

Poor Performance ◽

Service Robot ◽

Visual Features ◽

Map Building ◽

Front View ◽

Localization And Mapping ◽

Graph Optimization

The method of simultaneous localization and mapping (SLAM) using a light detection and ranging (LiDAR) sensor is commonly adopted for robot navigation. However, consumer robots are price sensitive and often have to use low-cost sensors. Due to the poor performance of a low-cost LiDAR, error accumulates rapidly while SLAM, and it may cause a huge error for building a larger map. To cope with this problem, this paper proposes a new graph optimization-based SLAM framework through the combination of low-cost LiDAR sensor and vision sensor. In the SLAM framework, a new cost-function considering both scan and image data is proposed, and the Bag of Words (BoW) model with visual features is applied for loop close detection. A 2.5D map presenting both obstacles and vision features is also proposed, as well as a fast relocation method with the map. Experiments were taken on a service robot equipped with a 360° low-cost LiDAR and a front-view RGB-D camera in the real indoor scene. The results show that the proposed method has better performance than using LiDAR or camera only, while the relocation speed with our 2.5D map is much faster than with traditional grid map.

Download Full-text

EnvSLAM: Combining SLAM Systems and Neural Networks to Improve the Environment Fusion in AR Applications

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10110772 ◽

2021 ◽

Vol 10 (11) ◽

pp. 772

Author(s):

Giulia Marchesi ◽

Christian Eichhorn ◽

David A. Plecher ◽

Yuta Itoh ◽

Gudrun Klinker

Keyword(s):

Neural Networks ◽

Real Time ◽

Prediction Accuracy ◽

Semantic Knowledge ◽

Input Image ◽

Image Resolution ◽

Extensive Evaluation ◽

Localization And Mapping ◽

Strong Focus ◽

Sensor Information

Augmented Reality (AR) has increasingly benefited from the use of Simultaneous Localization and Mapping (SLAM) systems. This technology has enabled developers to create AR markerless applications, but lack semantic understanding of their environment. The inclusion of this information would empower AR applications to better react to the surroundings more realistically. To gain semantic knowledge, in recent years, focus has shifted toward fusing SLAM systems with neural networks, giving birth to the field of Semantic SLAM. Building on existing research, this paper aimed to create a SLAM system that generates a 3D map using ORB-SLAM2 and enriches it with semantic knowledge originated from the Fast-SCNN network. The key novelty of our approach is a new method for improving the predictions of neural networks, employed to balance the loss of accuracy introduced by efficient real-time models. Exploiting sensor information provided by a smartphone, GPS coordinates are utilized to query the OpenStreetMap database. The returned information is used to understand which classes are currently absent in the environment, so that they can be removed from the network’s prediction with the goal of improving its accuracy. We achieved 87.40% Pixel Accuracy with Fast-SCNN on our custom version of COCO-Stuff and showed an improvement by involving GPS data for our self-made smartphone dataset resulting in 90.24% Pixel Accuracy. Having in mind the use on smartphones, the implementation aimed to find a trade-off between accuracy and efficiency, making the system achieve an unprecedented speed. To this end, the system was carefully designed and a strong focus on lightweight neural networks is also fundamental. This enabled the creation of an above real-time Semantic SLAM system that we called EnvSLAM (Environment SLAM). Our extensive evaluation reveals the efficiency of the system features and the operability in above real-time (48.1 frames per second with an input image resolution of 640 × 360 pixels). Moreover, the GPS integration indicates an effective improvement of the network’s prediction accuracy.

Download Full-text

OC-SLAM: Steadily Tracking and Mapping in Dynamic Environments

Frontiers in Energy Research ◽

10.3389/fenrg.2021.803631 ◽

2021 ◽

Vol 9 ◽

Author(s):

Zhenyu Wu ◽

Xiangyu Deng ◽

Shengming Li ◽

Yingshun Li

Keyword(s):

Object Detection ◽

Point Cloud ◽

Point Clouds ◽

Dynamic Environments ◽

Detection Algorithm ◽

Dynamic Features ◽

Dynamic Scenes ◽

Original Algorithm ◽

Localization And Mapping ◽

Dense Point

Visual Simultaneous Localization and Mapping (SLAM) system is mainly used in real-time localization and mapping tasks of robots in various complex environments, while traditional monocular vision algorithms are struggling to cope with weak texture and dynamic scenes. To solve these problems, this work presents an object detection and clustering assisted SLAM algorithm (OC-SLAM), which adopts a faster object detection algorithm to add semantic information to the image and conducts geometrical constraint on the dynamic keypoints in the prediction box to optimize the camera pose. It also uses RGB-D camera to perform dense point cloud reconstruction with the dynamic objects rejected, and facilitates European clustering of dense point clouds to jointly eliminate dynamic features combining with object detection algorithm. Experiments in the TUM dataset indicate that OC-SLAM enhances the localization accuracy of the SLAM system in the dynamic environments compared with original algorithm and it has shown impressive performance in the localizition and can build a more precise dense point cloud map in dynamic scenes.

Download Full-text

A Study on Utilization of Three-Dimensional Sensor Lip Image for Developing a Pronunciation Recognition System

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2019.63.5.050402 ◽

2019 ◽

Vol 63 (5) ◽

pp. 50402-1-50402-9 ◽

Cited By ~ 1

Author(s):

Ing-Jr Ding ◽

Chong-Min Ruan

Keyword(s):

Principal Component Analysis ◽

Automatic Speech Recognition ◽

Feature Fusion ◽

Three Dimensional ◽

Principal Component ◽

Recognition System ◽

Geometrical Characteristics ◽

3D Geometry ◽

Different Types ◽

The Disabled

Abstract The acoustic-based automatic speech recognition (ASR) technique has been a matured technique and widely seen to be used in numerous applications. However, acoustic-based ASR will not maintain a standard performance for the disabled group with an abnormal face, that is atypical eye or mouth geometrical characteristics. For governing this problem, this article develops a three-dimensional (3D) sensor lip image based pronunciation recognition system where the 3D sensor is efficiently used to acquire the action variations of the lip shapes of the pronunciation action from a speaker. In this work, two different types of 3D lip features for pronunciation recognition are presented, 3D-(x, y, z) coordinate lip feature and 3D geometry lip feature parameters. For the 3D-(x, y, z) coordinate lip feature design, 18 location points, each of which has 3D-sized coordinates, around the outer and inner lips are properly defined. In the design of 3D geometry lip features, eight types of features considering the geometrical space characteristics of the inner lip are developed. In addition, feature fusion to combine both 3D-(x, y, z) coordinate and 3D geometry lip features is further considered. The presented 3D sensor lip image based feature evaluated the performance and effectiveness using the principal component analysis based classification calculation approach. Experimental results on pronunciation recognition of two different datasets, Mandarin syllables and Mandarin phrases, demonstrate the competitive performance of the presented 3D sensor lip image based pronunciation recognition system.

Download Full-text

Human Ear Pattern Recognition System

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.35 ◽

2017 ◽

Vol 7 (8) ◽

pp. 117 ◽

Cited By ~ 1

Author(s):

V. Jagan Naveen ◽

K. Krishna Kishore ◽

P. Rajesh Kumar

Keyword(s):

Pattern Recognition ◽

Expectation Maximization Algorithm ◽

Recognition System ◽

Modern World ◽

Person Identification ◽

Biometric Systems ◽

Different Types ◽

Recognition Systems ◽

Human Ear ◽

Pca Algorithm

In the modern world, human recognition systems play an important role to improve security by reducing chances of evasion. Human ear is used for person identification .In the Empirical study on research on human ear, 10000 images are taken to find the uniqueness of the ear. Ear based system is one of the few biometric systems which can provides stable characteristics over the age. In this paper, ear images are taken from mathematical analysis of images (AMI) ear data base and the analysis is done on ear pattern recognition based on the Expectation maximization algorithm and k means algorithm. Pattern of ears affected with different types of noises are recognized based on Principle component analysis (PCA) algorithm.

Download Full-text

How Well Do Distributional Models Capture Different Types of Semantic Knowledge?

10.3115/v1/p15-2119 ◽

2015 ◽

Cited By ~ 4

Author(s):

Dana Rubinstein ◽

Effi Levi ◽

Roy Schwartz ◽

Ari Rappoport

Keyword(s):

Semantic Knowledge ◽

Different Types

Download Full-text

Design of Desktop Audiovisual Entertainment System with Deep Learning and Haptic Sensations

Symmetry ◽

10.3390/sym12101718 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1718

Author(s):

Chien-Hsing Chou ◽

Yu-Sheng Su ◽

Che-Ju Hsu ◽

Kong-Chang Lee ◽

Ping-Hsuan Han

Keyword(s):

Deep Learning ◽

Object Detection ◽

User Experience ◽

Recognition System ◽

Scene Recognition ◽

Single Shot ◽

Auditory Signals ◽

Hot Weather ◽

Viewing Experience ◽

At Home

In this study, we designed a four-dimensional (4D) audiovisual entertainment system called Sense. This system comprises a scene recognition system and hardware modules that provide haptic sensations for users when they watch movies and animations at home. In the scene recognition system, we used Google Cloud Vision to detect common scene elements in a video, such as fire, explosions, wind, and rain, and further determine whether the scene depicts hot weather, rain, or snow. Additionally, for animated videos, we applied deep learning with a single shot multibox detector to detect whether the animated video contained scenes of fire-related objects. The hardware module was designed to provide six types of haptic sensations set as line-symmetry to provide a better user experience. After the system considers the results of object detection via the scene recognition system, the system generates corresponding haptic sensations. The system integrates deep learning, auditory signals, and haptic sensations to provide an enhanced viewing experience.

Download Full-text

YOLOv4 Object Detection Algorithm with Efficient Channel Attention Mechanism

2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE) ◽

10.1109/icmcce51767.2020.00387 ◽

2020 ◽

Author(s):

Cui Gao ◽

Qiang Cai ◽

Shaofeng Ming

Keyword(s):

Object Detection ◽

Detection Algorithm ◽

Attention Mechanism

Download Full-text

Deep Learning-Based Object Detection, Localisation and Tracking for Smart Wheelchair Healthcare Mobility

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18010091 ◽

2020 ◽

Vol 18 (1) ◽

pp. 91

Author(s):

Louis Lecrosnier ◽

Redouane Khemmar ◽

Nicolas Ragot ◽

Benoit Decoux ◽

Romain Rossi ◽

...

Keyword(s):

Object Detection ◽

Object Tracking ◽

Indoor Environment ◽

Distance Estimation ◽

Depth Estimation ◽

Detection Algorithm ◽

Use Case ◽

Detection Distance ◽

Detection Depth ◽

Sort Algorithm

This paper deals with the development of an Advanced Driver Assistance System (ADAS) for a smart electric wheelchair in order to improve the autonomy of disabled people. Our use case, built from a formal clinical study, is based on the detection, depth estimation, localization and tracking of objects in wheelchair’s indoor environment, namely: door and door handles. The aim of this work is to provide a perception layer to the wheelchair, enabling this way the detection of these keypoints in its immediate surrounding, and constructing of a short lifespan semantic map. Firstly, we present an adaptation of the YOLOv3 object detection algorithm to our use case. Then, we present our depth estimation approach using an Intel RealSense camera. Finally, as a third and last step of our approach, we present our 3D object tracking approach based on the SORT algorithm. In order to validate all the developments, we have carried out different experiments in a controlled indoor environment. Detection, distance estimation and object tracking are experimented using our own dataset, which includes doors and door handles.

Download Full-text