A Visual Humanoid Teleoperation Control for Approaching Target Object

Symbolic behaviours such as language, music, drawing, dance, etc. are unique to humans and are found universally in every culture on earth1. These behaviours operate in different cognitive domains, but they are commonly characterised as linear sequences of symbols2,3. One of the most prominent features of language is hierarchical structure4, which is also found in music5,6 and mathematics7. Current research attempts to address whether hierarchical structure exists in drawing. When we draw complex objects, such as a face, we draw part by part in a hierarchical manner guided by visual semantic knowledge8. More specifically, we predicted how hierarchical structure emerges in drawing as follows. Although the drawing order of the constituent parts composing the target object is different amongst individuals, some parts will be drawn in succession consistently, thereby forming chunks. These chunks of parts would then be further integrated with other chunks into superordinate chunks, while showing differential affinity amongst chunks. The integration of chunks to an even higher chunk level repeats until finally reaching the full object. We analysed the order of drawing strokes of twenty-two complex objects by twenty-five young healthy adult participants with a cluster analysis9 and demonstrated reasonable hierarchical structures. The results suggest that drawing involves a linear production of symbols with a hierarchical structure. From an evolutionary point of view, we argue that ancient engravings and paintings manifest Homo sapiens’ capability for hierarchical symbolic cognition.

Download Full-text

Spherical object recognition based on the number of contour edges extracted by fitting and convex hull processing

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020969502 ◽

2020 ◽

pp. 095440702096950

Author(s):

Bochang Zou ◽

Huadong Qiu ◽

Yufeng Lu

Keyword(s):

Convex Hull ◽

Target Object ◽

Identification Accuracy ◽

Detection Accuracy ◽

Morphological Operations ◽

Circle Detection ◽

Spherical Object ◽

Single Feature ◽

Contour Fitting ◽

Rgb Images

The detection of spherical targets in workpiece shape clustering and fruit classification tasks is challenging. Spherical targets produce low detection accuracy in complex fields, and single-feature processing cannot accurately recognize spheres. Therefore, a novel spherical descriptor (SD) for contour fitting and convex hull processing is proposed. The SD achieves image de-noising by combining flooding processing and morphological operations. The number of polygon-fitted edges is obtained by convex hull processing based on contour extraction and fitting, and two RGB images of the same group of objects are obtained from different directions. The two fitted edges of the same target object obtained at two RGB images are extracted to form a two-dimensional array. The target object is defined as a sphere if the two values of the array are greater than a custom threshold. The first classification result is obtained by an improved K-NN algorithm. Circle detection is then performed on the results using improved Hough circle detection. We abbreviate it as a new Hough transform sphere descriptor (HSD). Experiments demonstrate that recognition of spherical objects is obtained with 98.8% accuracy. Therefore, experimental results show that our method is compared with other latest methods, HSD has higher identification accuracy than other methods.

Download Full-text

A robotic grasping approach with elliptical cone-based potential fields under disturbed scenes

International Journal of Advanced Robotic Systems ◽

10.1177/1729881420985739 ◽

2021 ◽

Vol 18 (1) ◽

pp. 172988142098573

Author(s):

Wenjie Geng ◽

Zhiqiang Cao ◽

Zhonghui Li ◽

Yingying Yu ◽

Fengshui Jing ◽

...

Keyword(s):

Principal Component ◽

Major Axis ◽

Target Object ◽

Potential Fields ◽

Single Shot ◽

Robotic Grasping ◽

Vertical Projection ◽

Convex Curves ◽

Functional Representation ◽

Continuous Potential

Vision-based grasping plays an important role in the robot providing better services. It is still challenging under disturbed scenes, where the target object cannot be directly grasped constrained by the interferences from other objects. In this article, a robotic grasping approach with firstly moving the interference objects is proposed based on elliptical cone-based potential fields. Single-shot multibox detector (SSD) is adopted to detect objects, and considering the scene complexity, Euclidean cluster is also employed to obtain the objects without being trained by SSD. And then, we acquire the vertical projection of the point cloud for each object. Considering that different objects have different shapes with respective orientation, the vertical projection is executed along its major axis acquired by the principal component analysis. On this basis, the minimum projected envelope rectangle of each object is obtained. To construct continuous potential field functions, an elliptical-based functional representation is introduced due to the better matching degree of the ellipse with the envelope rectangle among continuous closed convex curves. Guided by the design principles, including continuity, same-eccentricity equivalence, and monotonicity, the potential fields based on elliptical cone are designed. The current interference object to be grasped generates an attractive field, whereas other objects correspond to repulsive ones, and their resultant field is used to solve the best placement of the current interference object. The effectiveness of the proposed approach is verified by experiments.

Download Full-text

A Method to Make a Robot Understand What was a Target Object in Motion Copying System

2020 IEEE 16th International Workshop on Advanced Motion Control (AMC) ◽

10.1109/amc44022.2020.9244428 ◽

2020 ◽

Author(s):

Xiaobai Sun ◽

Takahiro Nozaki ◽

Toshiyuki Murakami ◽

Kouhei Ohnishi

Keyword(s):

Target Object

Download Full-text

Manipulation Planning for Object Re-Orientation Based on Semantic Segmentation Keypoint Detection

Sensors ◽

10.3390/s21072280 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2280

Author(s):

Ching-Chang Wong ◽

Li-Yu Yeh ◽

Chih-Cheng Liu ◽

Chi-Yi Tsai ◽

Hisasuki Aoyama

Keyword(s):

Neural Network ◽

Robot Manipulator ◽

Detection System ◽

Semantic Segmentation ◽

Target Object ◽

Planning System ◽

Normal Vector ◽

Manipulation Planning ◽

Keypoint Detection ◽

3D Keypoint Detection

In this paper, a manipulation planning method for object re-orientation based on semantic segmentation keypoint detection is proposed for robot manipulator which is able to detect and re-orientate the randomly placed objects to a specified position and pose. There are two main parts: (1) 3D keypoint detection system; and (2) manipulation planning system for object re-orientation. In the 3D keypoint detection system, an RGB-D camera is used to obtain the information of the environment and can generate 3D keypoints of the target object as inputs to represent its corresponding position and pose. This process simplifies the 3D model representation so that the manipulation planning for object re-orientation can be executed in a category-level manner by adding various training data of the object in the training phase. In addition, 3D suction points in both the object’s current and expected poses are also generated as the inputs of the next operation stage. During the next stage, Mask Region-Convolutional Neural Network (Mask R-CNN) algorithm is used for preliminary object detection and object image. The highest confidence index image is selected as the input of the semantic segmentation system in order to classify each pixel in the picture for the corresponding pack unit of the object. In addition, after using a convolutional neural network for semantic segmentation, the Conditional Random Fields (CRFs) method is used to perform several iterations to obtain a more accurate result of object recognition. When the target object is segmented into the pack units of image process, the center position of each pack unit can be obtained. Then, a normal vector of each pack unit’s center points is generated by the depth image information and pose of the object, which can be obtained by connecting the center points of each pack unit. In the manipulation planning system for object re-orientation, the pose of the object and the normal vector of each pack unit are first converted into the working coordinate system of the robot manipulator. Then, according to the current and expected pose of the object, the spherical linear interpolation (Slerp) algorithm is used to generate a series of movements in the workspace for object re-orientation on the robot manipulator. In addition, the pose of the object is adjusted on the z-axis of the object’s geodetic coordinate system based on the image features on the surface of the object, so that the pose of the placed object can approach the desired pose. Finally, a robot manipulator and a vacuum suction cup made by the laboratory are used to verify that the proposed system can indeed complete the planned task of object re-orientation.

Download Full-text

Grasp to See—Object Classification Using Flexion Glove with Support Vector Machine

Sensors ◽

10.3390/s21041461 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1461

Author(s):

Shun-Hsin Yu ◽

Jen-Shuo Chang ◽

Chia-Hung Dylan Tsai

Keyword(s):

Support Vector Machine ◽

Daily Life ◽

Object Classification ◽

Target Object ◽

The Other ◽

Support Vector ◽

Wide Range ◽

Dark Space ◽

Flex Sensors ◽

Shape And Size

This paper proposes an object classification method using a flexion glove and machine learning. The classification is performed based on the information obtained from a single grasp on a target object. The flexion glove is developed with five flex sensors mounted on five finger sleeves, and is used for measuring the flexion of individual fingers while grasping an object. Flexion signals are divided into three phases, and they are the phases of picking, holding and releasing, respectively. Grasping features are extracted from the phase of holding for training the support vector machine. Two sets of objects are prepared for the classification test. One is printed-object set and the other is daily-life object set. The printed-object set is for investigating the patterns of grasping with specified shape and size, while the daily-life object set includes nine objects randomly chosen from daily life for demonstrating that the proposed method can be used to identify a wide range of objects. According to the results, the accuracy of the classifications are achieved 95.56% and 88.89% for the sets of printed objects and daily-life objects, respectively. A flexion glove which can perform object classification is successfully developed in this work and is aimed at potential grasp-to-see applications, such as visual impairment aid and recognition in dark space.

Download Full-text

Investigating the utility of VR for spatial understanding in surgical planning: evaluation of head-mounted to desktop display

Scientific Reports ◽

10.1038/s41598-021-92536-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Georges Hattab ◽

Adamantini Hatzipanayioti ◽

Anna Klimova ◽

Micha Pfeiffer ◽

Peter Klausing ◽

...

Keyword(s):

Surgical Planning ◽

Spatial Information ◽

Scene Understanding ◽

3D Models ◽

Target Object ◽

Learning Condition ◽

Estimation Task ◽

Head Mounted Display ◽

Direction Estimation ◽

Spatial Understanding

AbstractRecent technological advances have made Virtual Reality (VR) attractive in both research and real world applications such as training, rehabilitation, and gaming. Although these other fields benefited from VR technology, it remains unclear whether VR contributes to better spatial understanding and training in the context of surgical planning. In this study, we evaluated the use of VR by comparing the recall of spatial information in two learning conditions: a head-mounted display (HMD) and a desktop screen (DT). Specifically, we explored (a) a scene understanding and then (b) a direction estimation task using two 3D models (i.e., a liver and a pyramid). In the scene understanding task, participants had to navigate the rendered the 3D models by means of rotation, zoom and transparency in order to substantially identify the spatial relationships among its internal objects. In the subsequent direction estimation task, participants had to point at a previously identified target object, i.e., internal sphere, on a materialized 3D-printed version of the model using a tracked pointing tool. Results showed that the learning condition (HMD or DT) did not influence participants’ memory and confidence ratings of the models. In contrast, the model type, that is, whether the model to be recalled was a liver or a pyramid significantly affected participants’ memory about the internal structure of the model. Furthermore, localizing the internal position of the target sphere was also unaffected by participants’ previous experience of the model via HMD or DT. Overall, results provide novel insights on the use of VR in a surgical planning scenario and have paramount implications in medical learning by shedding light on the mental model we make to recall spatial structures.

Download Full-text

Method for Viewing Real-World Scenes while Recording Video

Applied Sciences ◽

10.3390/app11104617 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4617

Author(s):

Daehee Park ◽

Cheoljun Lee

Keyword(s):

Real World ◽

Video Recording ◽

Emotional Responses ◽

Target Object ◽

Error Rates ◽

User Needs ◽

High Quality ◽

Gradient Algorithms ◽

Squared Error

Because smartphones support various functions, they are carried by users everywhere. Whenever a user believes that a moment is interesting, important, or meaningful to them, they can record a video to preserve such memories. The main problem with video recording an important moment is the fact that the user needs to look at the scene through the mobile phone screen rather than seeing the actual real-world event. This occurs owing to uncertainty the user might feel when recording the video. For example, the user might not be sure if the recording is of high-quality and might worry about missing the target object. To overcome this, we developed a new camera application that utilizes two main algorithms, the minimum output sum of squared error and the histograms of oriented gradient algorithms, to track the target object and recognize the direction of the user’s head. We assumed that the functions of the new camera application can solve the user’s anxiety while recording a video. To test the effectiveness of the proposed application, we conducted a case study and measured the emotional responses of users and the error rates based on a comparison with the use of a regular camera application. The results indicate that the new camera application induces greater feelings of pleasure, excitement, and independence than a regular camera application. Furthermore, it effectively reduces the error rates during video recording.

Download Full-text

When more is more: redundant modifiers can facilitate visual search

Cognitive Research Principles and Implications ◽

10.1186/s41235-021-00275-4 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Gwendolyn Rehrig ◽

Reese A. Cullimore ◽

John M. Henderson ◽

Fernanda Ferreira

Keyword(s):

Visual Search ◽

Real World ◽

Search Space ◽

Relevant Information ◽

Target Object ◽

Search Performance ◽

Redundant Information ◽

Additional Information ◽

Target Template ◽

Template Specificity

Abstract According to the Gricean Maxim of Quantity, speakers provide the amount of information listeners require to correctly interpret an utterance, and no more (Grice in Logic and conversation, 1975). However, speakers do tend to violate the Maxim of Quantity often, especially when the redundant information improves reference precision (Degen et al. in Psychol Rev 127(4):591–621, 2020). Redundant (non-contrastive) information may facilitate real-world search if it narrows the spatial scope under consideration, or improves target template specificity. The current study investigated whether non-contrastive modifiers that improve reference precision facilitate visual search in real-world scenes. In two visual search experiments, we compared search performance when perceptually relevant, but non-contrastive modifiers were included in the search instruction. Participants (NExp. 1 = 48, NExp. 2 = 48) searched for a unique target object following a search instruction that contained either no modifier, a location modifier (Experiment 1: on the top left, Experiment 2: on the shelf), or a color modifier (the black lamp). In Experiment 1 only, the target was located faster when the verbal instruction included either modifier, and there was an overall benefit of color modifiers in a combined analysis for scenes and conditions common to both experiments. The results suggest that violations of the Maxim of Quantity can facilitate search when the violations include task-relevant information that either augments the target template or constrains the search space, and when at least one modifier provides a highly reliable cue. Consistent with Degen et al. (2020), we conclude that listeners benefit from non-contrastive information that improves reference precision, and engage in rational reference comprehension. Significance statement This study investigated whether providing more information than someone needs to find an object in a photograph helps them to find that object more easily, even though it means they need to interpret a more complicated sentence. Before searching a scene, participants were either given information about where the object would be located in the scene, what color the object was, or were only told what object to search for. The results showed that providing additional information helped participants locate an object in an image more easily only when at least one piece of information communicated what part of the scene the object was in, which suggests that more information can be beneficial as long as that information is specific and helps the recipient achieve a goal. We conclude that people will pay attention to redundant information when it supports their task. In practice, our results suggest that instructions in other contexts (e.g., real-world navigation, using a smartphone app, prescription instructions, etc.) can benefit from the inclusion of what appears to be redundant information.

Download Full-text