Soft missing-feature mask generation for Robot Audition

AbstractThis paper describes an improvement in automatic speech recognition (ASR) for robot audition by introducing Missing Feature Theory (MFT) based on soft missing feature masks (MFM) to realize natural human-robot interaction. In an everyday environment, a robot’s microphones capture various sounds besides the user’s utterances. Although sound-source separation is an effective way to enhance the user’s utterances, it inevitably produces errors due to reflection and reverberation. MFT is able to cope with these errors. First, MFMs are generated based on the reliability of time-frequency components. Then ASR weighs the time-frequency components according to the MFMs. We propose a new method to automatically generate soft MFMs, consisting of continuous values from 0 to 1 based on a sigmoid function. The proposed MFM generation was implemented for HRP-2 using HARK, our open-sourced robot audition software. Preliminary results show that the soft MFM outperformed a hard (binary) MFM in recognizing three simultaneous utterances. In a human-robot interaction task, the interval limitations between two adjacent loudspeakers were reduced from 60 degrees to 30 degrees by using soft MFMs.

Download Full-text

Improvement of robot audition by interfacing sound source separation and automatic speech recognition with Missing Feature Theory

IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004 ◽

10.1109/robot.2004.1308039 ◽

2004 ◽

Cited By ~ 22

Author(s):

S. Yamamoto ◽

K. Nakadai ◽

H. Tsujino ◽

T. Yokoyama ◽

H.G. Okuno

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Sound Source ◽

Source Separation ◽

Sound Source Separation ◽

Robot Audition ◽

Missing Feature Theory ◽

Missing Feature ◽

Feature Theory

Download Full-text

Missing Feature Theory based Interface Between Sound Source Separation and Automatic Speech Recognition and Applying to Multiple Robots

Journal of the Robotics Society of Japan ◽

10.7210/jrsj.23.743 ◽

2005 ◽

Vol 23 (6) ◽

pp. 743-751

Author(s):

Shunichi Yamamoto ◽

Kazuhiro Nakadai ◽

Hiroshi Tsujino ◽

Hiroshi G. Okuno

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Sound Source ◽

Source Separation ◽

Multiple Robots ◽

Sound Source Separation ◽

Missing Feature Theory ◽

Missing Feature ◽

Feature Theory

Download Full-text

Robot Audition: Missing Feature Theory Approach and Active Audition

Springer Tracts in Advanced Robotics - Robotics Research ◽

10.1007/978-3-642-19457-3_14 ◽

2011 ◽

pp. 227-244 ◽

Cited By ~ 8

Author(s):

Hiroshi G. Okuno ◽

Kazuhiro Nakadai ◽

Hyun-Don Kim

Keyword(s):

Theory Approach ◽

Robot Audition ◽

Missing Feature Theory ◽

Missing Feature ◽

Feature Theory

Download Full-text

A corroborative approach to verification and validation of human–robot teams

The International Journal of Robotics Research ◽

10.1177/0278364919883338 ◽

2019 ◽

Vol 39 (1) ◽

pp. 73-99 ◽

Cited By ~ 5

Author(s):

Matt Webster ◽

David Western ◽

Dejanira Araiza-Illan ◽

Clare Dixon ◽

Kerstin Eder ◽

...

Keyword(s):

Human Robot Interaction ◽

Verification And Validation ◽

Service Robots ◽

Robot Interaction ◽

Interaction Task ◽

Real Robot ◽

Functional Correctness ◽

Corroborative Evidence ◽

The Individual ◽

Verification Model

We present an approach for the verification and validation (V&V) of robot assistants in the context of human–robot interactions, to demonstrate their trustworthiness through corroborative evidence of their safety and functional correctness. Key challenges include the complex and unpredictable nature of the real world in which assistant and service robots operate, the limitations on available V&V techniques when used individually, and the consequent lack of confidence in the V&V results. Our approach, called corroborative V&V, addresses these challenges by combining several different V&V techniques; in this paper we use formal verification (model checking), simulation-based testing, and user validation in experiments with a real robot. This combination of approaches allows V&V of the human–robot interaction task at different levels of modeling detail and thoroughness of exploration, thus overcoming the individual limitations of each technique. We demonstrate our approach through a handover task, the most critical part of a complex cooperative manufacturing scenario, for which we propose safety and liveness requirements to verify and validate. Should the resulting V&V evidence present discrepancies, an iterative process between the different V&V techniques takes place until corroboration between the V&V techniques is gained from refining and improving the assets (i.e., system and requirement models) to represent the human–robot interaction task in a more truthful manner. Therefore, corroborative V&V affords a systematic approach to “meta-V&V,” in which different V&V techniques can be used to corroborate and check one another, increasing the level of certainty in the results of V&V.

Download Full-text

Barge-in-able robot audition based on ICA and missing feature theory under semi-blind situation

2008 IEEE/RSJ International Conference on Intelligent Robots and Systems ◽

10.1109/iros.2008.4650799 ◽

2008 ◽

Cited By ~ 6

Author(s):

R. Takeda ◽

K. Nakadai ◽

K. Komatani ◽

T. Ogata ◽

H.G. Okuno

Keyword(s):

Robot Audition ◽

Missing Feature Theory ◽

Missing Feature ◽

Feature Theory

Download Full-text

Identifying Differences in Social Responsiveness among Preschoolers Interacting with or Watching Social Robots

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/832 ◽

2018 ◽

Author(s):

Marie D. Manner

Keyword(s):

Eye Tracking ◽

Humanoid Robot ◽

Children With Autism ◽

Human Robot Interaction ◽

Social Responsiveness ◽

Robot Interaction ◽

Interaction Task ◽

Interaction Experiment ◽

Play Interaction ◽

Autism Phenotype

We describe experiments performed with a large number of preschool children (ages 1.5 to 4 years) in a two-task eye tracking experiment and a human-robot interaction experiment. The resulting data of mostly neuro-typical children forms a baseline with which to compare children with autism, allowing us to further characterize the autism phenotype. Eye tracking task results indicate a strong preference for a humanoid robot and a social being (a four year old girl) over other robot types. Results from the human-robot interaction task, a semi-structured play interaction between child and robot, showed we can cluster participants based on social distances and other social responsiveness metrics.

Download Full-text

Implementation of a Refusable Human-Robot Interaction Task with Humanoid Robot by Connecting Soar and ROS

The Journal of Korea Robotics Society ◽

10.7746/jkros.2017.12.1.055 ◽

2017 ◽

Vol 12 (1) ◽

pp. 55-64 ◽

Cited By ~ 3

Author(s):

Chien Van Dang ◽

◽

Tin Trung Tran ◽

Trung Xuan Pham ◽

Ki-Jong Gil ◽

...

Keyword(s):

Humanoid Robot ◽

Human Robot Interaction ◽

Robot Interaction ◽

Interaction Task

Download Full-text

A Review on Voice-based Interface for Human-Robot Interaction

Iraqi Journal for Electrical And Electronic Engineering ◽

10.37917/ijeee.16.2.10 ◽

2020 ◽

Vol 16 (2) ◽

pp. 1-12

Author(s):

Ameer Badr ◽

Alia Abdul-Hassan

Keyword(s):

Feature Extraction ◽

Dimensionality Reduction ◽

Speaker Recognition ◽

Subspace Learning ◽

Human Robot Interaction ◽

Machine Learning Techniques ◽

Signal Spectrum ◽

Robot Interaction ◽

Time Frequency ◽

The Voice

With the recent developments of technology and the advances in artificial intelligence and machine learning techniques, it has become possible for the robot to understand and respond to voice as part of Human-Robot Interaction (HRI). The voice-based interface robot can recognize the speech information from humans so that it will be able to interact more naturally with its human counterpart in different environments. In this work, a review of the voice-based interface for HRI systems has been presented. The review focuses on voice-based perception in HRI systems from three facets, which are: feature extraction, dimensionality reduction, and semantic understanding. For feature extraction, numerous types of features have been reviewed in various domains, such as time, frequency, cepstral (i.e. implementing the inverse Fourier transform for the signal spectrum logarithm), and deep domains. For dimensionality reduction, subspace learning can be used to eliminate the redundancies of high-dimensional features by further processing extracted features to reflect their semantic information better. For semantic understanding, the aim is to infer from the extracted features the objects or human behaviors. Numerous types of semantic understanding have been reviewed, such as speech recognition, speaker recognition, speaker gender detection, speaker gender and age estimation, and speaker localization. Finally, some of the existing voice-based interface issues and recommendations for future works have been outlined.

Download Full-text