scholarly journals In pixels we trust: From Pixel Labeling to Object Localization and Scene Categorization

Author(s):  
Carlos Herranz-Perdiguero ◽  
Carolina Redondo-Cabrera ◽  
Roberto J. Lopez-Sastre
2020 ◽  
Author(s):  
Gopi Krishna Erabati

The technology in current research scenario is marching towards automation forhigher productivity with accurate and precise product development. Vision andRobotics are domains which work to create autonomous systems and are the keytechnology in quest for mass productivity. The automation in an industry canbe achieved by detecting interactive objects and estimating the pose to manipulatethem. Therefore the object localization ( i.e., pose) includes position andorientation of object, has profound ?significance. The application of object poseestimation varies from industry automation to entertainment industry and fromhealth care to surveillance. The objective of pose estimation of objects is verysigni?cant in many cases, like in order for the robots to manipulate the objects,for accurate rendering of Augmented Reality (AR) among others.This thesis tries to solve the issue of object pose estimation using 3D dataof scene acquired from 3D sensors (e.g. Kinect, Orbec Astra Pro among others).The 3D data has an advantage of independence from object texture and invarianceto illumination. The proposal is divided into two phases : An o?ine phasewhere the 3D model template of the object ( for estimation of pose) is built usingIterative Closest Point (ICP) algorithm. And an online phase where the pose ofthe object is estimated by aligning the scene to the model using ICP, providedwith an initial alignment using 3D descriptors (like Fast Point Feature Transform(FPFH)).The approach we develop is to be integrated on two di?erent platforms :1)Humanoid robot `Pyrene' which has Orbec Astra Pro 3D sensor for data acquisition,and 2)Unmanned Aerial Vehicle (UAV) which has Intel Realsense Euclidon it. The datasets of objects (like electric drill, brick, a small cylinder, cake box)are acquired using Microsoft Kinect, Orbec Astra Pro and Intel RealSense Euclidsensors to test the performance of this technique. The objects which are used totest this approach are the ones which are used by robot. This technique is testedin two scenarios, fi?rstly, when the object is on the table and secondly when theobject is held in hand by a person. The range of objects from the sensor is 0.6to 1.6m. This technique could handle occlusions of the object by hand (when wehold the object), as ICP can work even if partial object is visible in the scene.


ROBOT ◽  
2013 ◽  
Vol 35 (4) ◽  
pp. 439 ◽  
Author(s):  
Lin WANG ◽  
Jianfu CAO ◽  
Chongzhao HAN

2012 ◽  
Vol 13 (S1) ◽  
Author(s):  
Xin Chen ◽  
Weibing Wan ◽  
Zhiyong Yong

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sandro L. Wiesmann ◽  
Laurent Caplette ◽  
Verena Willenbockel ◽  
Frédéric Gosselin ◽  
Melissa L.-H. Võ

AbstractHuman observers can quickly and accurately categorize scenes. This remarkable ability is related to the usage of information at different spatial frequencies (SFs) following a coarse-to-fine pattern: Low SFs, conveying coarse layout information, are thought to be used earlier than high SFs, representing more fine-grained information. Alternatives to this pattern have rarely been considered. Here, we probed all possible SF usage strategies randomly with high resolution in both the SF and time dimensions at two categorization levels. We show that correct basic-level categorizations of indoor scenes are linked to the sampling of relatively high SFs, whereas correct outdoor scene categorizations are predicted by an early use of high SFs and a later use of low SFs (fine-to-coarse pattern of SF usage). Superordinate-level categorizations (indoor vs. outdoor scenes) rely on lower SFs early on, followed by a shift to higher SFs and a subsequent shift back to lower SFs in late stages. In summary, our results show no consistent pattern of SF usage across tasks and only partially replicate the diagnostic SFs found in previous studies. We therefore propose that SF sampling strategies of observers differ with varying stimulus and task characteristics, thus favouring the notion of flexible SF usage.


Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 692
Author(s):  
Wen-Chia Tsai ◽  
Jhih-Sheng Lai ◽  
Kuan-Chou Chen ◽  
Vinay M.Shivanna ◽  
Jiun-In Guo

This paper proposes a lightweight moving object prediction system to detect and recognize pedestrian crossings, vehicles cutting-in, and vehicles ahead applying emergency brakes based on a 3D Convolution network for behavior prediction. The proposed design significantly improves the performance of the conventional 3D convolution network (C3D) adapted to predict the behaviors employing behavior recognition network capable of performing object localization, which is pivotal in detecting the numerous moving objects’ behaviors, combining and verifying the detected objects with the results of the YOLO v3 detection model with that of the proposed C3D model. Since the proposed system is a lightweight CNN model requiring far lesser parameters, it can be efficiently realized on an embedded system for real-time applications. The proposed lightweight C3D model achieves 10 frames per second (FPS) on a NVIDIA Jetson AGX Xavier and yields over 92.8% accuracy in recognizing pedestrian crossing, over 94.3% accuracy in detecting vehicle cutting-in behavior, and over 95% accuracy for vehicles applying emergency brakes.


Sign in / Sign up

Export Citation Format

Share Document