Object detection and mapping for service robot tasks

Robotica ◽  
2007 ◽  
Vol 25 (2) ◽  
pp. 175-187 ◽  
Author(s):  
Staffan Ekvall ◽  
Danica Kragic ◽  
Patric Jensfelt

SUMMARYThe problem studied in this paper is a mobile robot that autonomously navigates in a domestic environment, builds a map as it moves along and localizes its position in it. In addition, the robot detects predefined objects, estimates their position in the environment and integrates this with the localization module to automatically put the objects in the generated map. Thus, we demonstrate one of the possible strategies for the integration of spatial and semantic knowledge in a service robot scenario where a simultaneous localization and mapping (SLAM) and object detection recognition system work in synergy to provide a richer representation of the environment than it would be possible with either of the methods alone. Most SLAM systems build maps that are only used for localizing the robot. Such maps are typically based on grids or different types of features such as point and lines. The novelty is the augmentation of this process with an object-recognition system that detects objects in the environment and puts them in the map generated by the SLAM system. The metric map is also split into topological entities corresponding to rooms. In this way, the user can command the robot to retrieve a certain object from a certain room. We present the results of map building and an extensive evaluation of the object detection algorithm performed in an indoor setting.

Author(s):  
Samuel Humphries ◽  
Trevor Parker ◽  
Bryan Jonas ◽  
Bryan Adams ◽  
Nicholas J Clark

Quick identification of building and roads is critical for execution of tactical US military operations in an urban environment. To this end, a gridded, referenced, satellite images of an objective, often referred to as a gridded reference graphic or GRG, has become a standard product developed during intelligence preparation of the environment. At present, operational units identify key infrastructure by hand through the work of individual intelligence officers. Recent advances in Convolutional Neural Networks, however, allows for this process to be streamlined through the use of object detection algorithms. In this paper, we describe an object detection algorithm designed to quickly identify and label both buildings and road intersections present in an image. Our work leverages both the U-Net architecture as well the SpaceNet data corpus to produce an algorithm that accurately identifies a large breadth of buildings and different types of roads. In addition to predicting buildings and roads, our model numerically labels each building by means of a contour finding algorithm. Most importantly, the dual U-Net model is capable of predicting buildings and roads on a diverse set of test images and using these predictions to produce clean GRGs.


2019 ◽  
Vol 9 (10) ◽  
pp. 2105 ◽  
Author(s):  
Guolai Jiang ◽  
Lei Yin ◽  
Shaokun Jin ◽  
Chaoran Tian ◽  
Xinbo Ma ◽  
...  

The method of simultaneous localization and mapping (SLAM) using a light detection and ranging (LiDAR) sensor is commonly adopted for robot navigation. However, consumer robots are price sensitive and often have to use low-cost sensors. Due to the poor performance of a low-cost LiDAR, error accumulates rapidly while SLAM, and it may cause a huge error for building a larger map. To cope with this problem, this paper proposes a new graph optimization-based SLAM framework through the combination of low-cost LiDAR sensor and vision sensor. In the SLAM framework, a new cost-function considering both scan and image data is proposed, and the Bag of Words (BoW) model with visual features is applied for loop close detection. A 2.5D map presenting both obstacles and vision features is also proposed, as well as a fast relocation method with the map. Experiments were taken on a service robot equipped with a 360° low-cost LiDAR and a front-view RGB-D camera in the real indoor scene. The results show that the proposed method has better performance than using LiDAR or camera only, while the relocation speed with our 2.5D map is much faster than with traditional grid map.


2021 ◽  
Vol 10 (11) ◽  
pp. 772
Author(s):  
Giulia Marchesi ◽  
Christian Eichhorn ◽  
David A. Plecher ◽  
Yuta Itoh ◽  
Gudrun Klinker

Augmented Reality (AR) has increasingly benefited from the use of Simultaneous Localization and Mapping (SLAM) systems. This technology has enabled developers to create AR markerless applications, but lack semantic understanding of their environment. The inclusion of this information would empower AR applications to better react to the surroundings more realistically. To gain semantic knowledge, in recent years, focus has shifted toward fusing SLAM systems with neural networks, giving birth to the field of Semantic SLAM. Building on existing research, this paper aimed to create a SLAM system that generates a 3D map using ORB-SLAM2 and enriches it with semantic knowledge originated from the Fast-SCNN network. The key novelty of our approach is a new method for improving the predictions of neural networks, employed to balance the loss of accuracy introduced by efficient real-time models. Exploiting sensor information provided by a smartphone, GPS coordinates are utilized to query the OpenStreetMap database. The returned information is used to understand which classes are currently absent in the environment, so that they can be removed from the network’s prediction with the goal of improving its accuracy. We achieved 87.40% Pixel Accuracy with Fast-SCNN on our custom version of COCO-Stuff and showed an improvement by involving GPS data for our self-made smartphone dataset resulting in 90.24% Pixel Accuracy. Having in mind the use on smartphones, the implementation aimed to find a trade-off between accuracy and efficiency, making the system achieve an unprecedented speed. To this end, the system was carefully designed and a strong focus on lightweight neural networks is also fundamental. This enabled the creation of an above real-time Semantic SLAM system that we called EnvSLAM (Environment SLAM). Our extensive evaluation reveals the efficiency of the system features and the operability in above real-time (48.1 frames per second with an input image resolution of 640 × 360 pixels). Moreover, the GPS integration indicates an effective improvement of the network’s prediction accuracy.


2021 ◽  
Vol 9 ◽  
Author(s):  
Zhenyu Wu ◽  
Xiangyu Deng ◽  
Shengming Li ◽  
Yingshun Li

Visual Simultaneous Localization and Mapping (SLAM) system is mainly used in real-time localization and mapping tasks of robots in various complex environments, while traditional monocular vision algorithms are struggling to cope with weak texture and dynamic scenes. To solve these problems, this work presents an object detection and clustering assisted SLAM algorithm (OC-SLAM), which adopts a faster object detection algorithm to add semantic information to the image and conducts geometrical constraint on the dynamic keypoints in the prediction box to optimize the camera pose. It also uses RGB-D camera to perform dense point cloud reconstruction with the dynamic objects rejected, and facilitates European clustering of dense point clouds to jointly eliminate dynamic features combining with object detection algorithm. Experiments in the TUM dataset indicate that OC-SLAM enhances the localization accuracy of the SLAM system in the dynamic environments compared with original algorithm and it has shown impressive performance in the localizition and can build a more precise dense point cloud map in dynamic scenes.


2019 ◽  
Vol 63 (5) ◽  
pp. 50402-1-50402-9 ◽  
Author(s):  
Ing-Jr Ding ◽  
Chong-Min Ruan

Abstract The acoustic-based automatic speech recognition (ASR) technique has been a matured technique and widely seen to be used in numerous applications. However, acoustic-based ASR will not maintain a standard performance for the disabled group with an abnormal face, that is atypical eye or mouth geometrical characteristics. For governing this problem, this article develops a three-dimensional (3D) sensor lip image based pronunciation recognition system where the 3D sensor is efficiently used to acquire the action variations of the lip shapes of the pronunciation action from a speaker. In this work, two different types of 3D lip features for pronunciation recognition are presented, 3D-(x, y, z) coordinate lip feature and 3D geometry lip feature parameters. For the 3D-(x, y, z) coordinate lip feature design, 18 location points, each of which has 3D-sized coordinates, around the outer and inner lips are properly defined. In the design of 3D geometry lip features, eight types of features considering the geometrical space characteristics of the inner lip are developed. In addition, feature fusion to combine both 3D-(x, y, z) coordinate and 3D geometry lip features is further considered. The presented 3D sensor lip image based feature evaluated the performance and effectiveness using the principal component analysis based classification calculation approach. Experimental results on pronunciation recognition of two different datasets, Mandarin syllables and Mandarin phrases, demonstrate the competitive performance of the presented 3D sensor lip image based pronunciation recognition system.


Author(s):  
V. Jagan Naveen ◽  
K. Krishna Kishore ◽  
P. Rajesh Kumar

In the modern world, human recognition systems play an important role to   improve security by reducing chances of evasion. Human ear is used for person identification .In the Empirical study on research on human ear, 10000 images are taken to find the uniqueness of the ear. Ear based system is one of the few biometric systems which can provides stable characteristics over the age. In this paper, ear images are taken from mathematical analysis of images (AMI) ear data base and the analysis is done on ear pattern recognition based on the Expectation maximization algorithm and k means algorithm.  Pattern of ears affected with different types of noises are recognized based on Principle component analysis (PCA) algorithm.


2015 ◽  
Author(s):  
Dana Rubinstein ◽  
Effi Levi ◽  
Roy Schwartz ◽  
Ari Rappoport

Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1718
Author(s):  
Chien-Hsing Chou ◽  
Yu-Sheng Su ◽  
Che-Ju Hsu ◽  
Kong-Chang Lee ◽  
Ping-Hsuan Han

In this study, we designed a four-dimensional (4D) audiovisual entertainment system called Sense. This system comprises a scene recognition system and hardware modules that provide haptic sensations for users when they watch movies and animations at home. In the scene recognition system, we used Google Cloud Vision to detect common scene elements in a video, such as fire, explosions, wind, and rain, and further determine whether the scene depicts hot weather, rain, or snow. Additionally, for animated videos, we applied deep learning with a single shot multibox detector to detect whether the animated video contained scenes of fire-related objects. The hardware module was designed to provide six types of haptic sensations set as line-symmetry to provide a better user experience. After the system considers the results of object detection via the scene recognition system, the system generates corresponding haptic sensations. The system integrates deep learning, auditory signals, and haptic sensations to provide an enhanced viewing experience.


Author(s):  
Louis Lecrosnier ◽  
Redouane Khemmar ◽  
Nicolas Ragot ◽  
Benoit Decoux ◽  
Romain Rossi ◽  
...  

This paper deals with the development of an Advanced Driver Assistance System (ADAS) for a smart electric wheelchair in order to improve the autonomy of disabled people. Our use case, built from a formal clinical study, is based on the detection, depth estimation, localization and tracking of objects in wheelchair’s indoor environment, namely: door and door handles. The aim of this work is to provide a perception layer to the wheelchair, enabling this way the detection of these keypoints in its immediate surrounding, and constructing of a short lifespan semantic map. Firstly, we present an adaptation of the YOLOv3 object detection algorithm to our use case. Then, we present our depth estimation approach using an Intel RealSense camera. Finally, as a third and last step of our approach, we present our 3D object tracking approach based on the SORT algorithm. In order to validate all the developments, we have carried out different experiments in a controlled indoor environment. Detection, distance estimation and object tracking are experimented using our own dataset, which includes doors and door handles.


Sign in / Sign up

Export Citation Format

Share Document