silent speech Latest Research Papers

Ultrasonic Doppler Based Silent Speech Interface Using Perceptual Distance

Applied Sciences ◽

10.3390/app12020827 ◽

2022 ◽

Vol 12 (2) ◽

pp. 827

Author(s):

Ki-Seung Lee

Keyword(s):

Speech Signals ◽

Perceptual Evaluation ◽

Perceptual Distance ◽

Speech Interface ◽

Synthesized Speech ◽

Actual Speech ◽

The Difference ◽

Feasibility Test ◽

Short Time ◽

Silent Speech

Moderate performance in terms of intelligibility and naturalness can be obtained using previously established silent speech interface (SSI) methods. Nevertheless, a common problem associated with SSI has involved deficiencies in estimating the spectrum details, which results in synthesized speech signals that are rough, harsh, and unclear. In this study, harmonic enhancement (HE), was used during postprocessing to alleviate this problem by emphasizing the spectral fine structure of speech signals. To improve the subjective quality of synthesized speech, the difference between synthesized and actual speech was established by calculating the distance in the perceptual domains instead of using the conventional mean square error (MSE). Two deep neural networks (DNNs) were employed to separately estimate the speech spectra and the filter coefficients of HE, connected in a cascading manner. The DNNs were trained to incrementally and iteratively minimize both the MSE and the perceptual distance (PD). A feasibility test showed that the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility measure (STOI) were improved by 17.8 and 2.9%, respectively, compared with previous methods. Subjective listening tests revealed that the proposed method yielded perceptually preferred results compared with that of the conventional MSE-based method.

Exploring Silent Speech Interfaces Based on Frequency-Modulated Continuous-Wave Radar

Sensors ◽

10.3390/s22020649 ◽

2022 ◽

Vol 22 (2) ◽

pp. 649

Author(s):

David Ferreira ◽

Samuel Silva ◽

Francisco Curado ◽

António Teixeira

Keyword(s):

Ambient Noise ◽

Continuous Wave ◽

Speech Communication ◽

Acquisition Session ◽

Privacy Concerns ◽

Speech Interfaces ◽

Lighting Conditions ◽

Wave Radar ◽

Silent Speech ◽

Fold Cross Validation

Speech is our most natural and efficient form of communication and offers a strong potential to improve how we interact with machines. However, speech communication can sometimes be limited by environmental (e.g., ambient noise), contextual (e.g., need for privacy), or health conditions (e.g., laryngectomy), preventing the consideration of audible speech. In this regard, silent speech interfaces (SSI) have been proposed as an alternative, considering technologies that do not require the production of acoustic signals (e.g., electromyography and video). Unfortunately, despite their plentitude, many still face limitations regarding their everyday use, e.g., being intrusive, non-portable, or raising technical (e.g., lighting conditions for video) or privacy concerns. In line with this necessity, this article explores the consideration of contactless continuous-wave radar to assess its potential for SSI development. A corpus of 13 European Portuguese words was acquired for four speakers and three of them enrolled in a second acquisition session, three months later. Regarding the speaker-dependent models, trained and tested with data from each speaker while using 5-fold cross-validation, average accuracies of 84.50% and 88.00% were respectively obtained from Bagging (BAG) and Linear Regression (LR) classifiers, respectively. Additionally, recognition accuracies of 81.79% and 81.80% were also, respectively, achieved for the session and speaker-independent experiments, establishing promising grounds for further exploring this technology towards silent speech recognition.

Slow Firing Single Units Are Essential for Optimal Decoding of Silent Speech

10.21203/rs.3.rs-1196637/v1 ◽

2022 ◽

Author(s):

Philip Kennedy ◽

A. Ganesh ◽

A.J. Cervantes

Keyword(s):

Single Unit ◽

Essential Feature ◽

Single Units ◽

Loss Of Function ◽

Neural Signals ◽

Motor Speech ◽

Long Duration ◽

Decoding Accuracy ◽

Optimal Decoding ◽

Silent Speech

Abstract Summary The motivation of someone who is locked-in, that is, paralyzed and mute, is to find relief for their loss of function. The data presented in this report is part of an attempt to restore one of those lost functions, namely, speech. An essential feature of the development of a speech prosthetic is optimal decoding of patterns of recorded neural signals during silent or covert speech, that is, speaking ‘inside the head’ with no audible output due to the paralysis of the articulators. The aim of this paper is to illustrate the importance of both fast and slow single unit firings recorded from an individual with locked-in syndrome and from an intact participant speaking silently. Long duration electrodes were implanted in the motor speech cortex for up to 13 years in the locked-in participant. The data herein provide evidence that slow firing single units are essential for optimal decoding accuracy. Additional evidence indicates that slow firing single units can be conditioned in the locked-in participant five years after implantation, further supporting their role in decoding.

Machine Learning Methods for Automatic Silent Speech Recognition Using a Wearable Graphene Strain Gauge Sensor

Sensors ◽

10.3390/s22010299 ◽

2021 ◽

Vol 22 (1) ◽

pp. 299

Author(s):

Dafydd Ravenscroft ◽

Ioannis Prattis ◽

Tharun Kandukuri ◽

Yarjan Abdul Samad ◽

Giorgio Mallia ◽

...

Keyword(s):

Machine Learning ◽

Speech Recognition ◽

Strain Gauge ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Sound Waves ◽

Machine Leaning ◽

Strain Gauge Sensor ◽

Silent Speech

Silent speech recognition is the ability to recognise intended speech without audio information. Useful applications can be found in situations where sound waves are not produced or cannot be heard. Examples include speakers with physical voice impairments or environments in which audio transference is not reliable or secure. Developing a device which can detect non-auditory signals and map them to intended phonation could be used to develop a device to assist in such situations. In this work, we propose a graphene-based strain gauge sensor which can be worn on the throat and detect small muscle movements and vibrations. Machine learning algorithms then decode the non-audio signals and create a prediction on intended speech. The proposed strain gauge sensor is highly wearable, utilising graphene’s unique and beneficial properties including strength, flexibility and high conductivity. A highly flexible and wearable sensor able to pick up small throat movements is fabricated by screen printing graphene onto lycra fabric. A framework for interpreting this information is proposed which explores the use of several machine learning techniques to predict intended words from the signals. A dataset of 15 unique words and four movements, each with 20 repetitions, was developed and used for the training of the machine learning algorithms. The results demonstrate the ability for such sensors to be able to predict spoken words. We produced a word accuracy rate of 55% on the word dataset and 85% on the movements dataset. This work demonstrates a proof-of-concept for the viability of combining a highly wearable graphene strain gauge and machine leaning methods to automate silent speech recognition.

SpeeChin

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3494987 ◽

2021 ◽

Vol 5 (4) ◽

pp. 1-23

Author(s):

Ruidong Zhang ◽

Mingyang Chen ◽

Benjamin Steeper ◽

Yaxuan Li ◽

Zihan Yan ◽

...

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Real World ◽

User Study ◽

Imaging System ◽

User Studies ◽

Ir Imaging ◽

Real World Applications ◽

Challenges And Opportunities ◽

Silent Speech

This paper presents SpeeChin, a smart necklace that can recognize 54 English and 44 Chinese silent speech commands. A customized infrared (IR) imaging system is mounted on a necklace to capture images of the neck and face from under the chin. These images are first pre-processed and then deep learned by an end-to-end deep convolutional-recurrent-neural-network (CRNN) model to infer different silent speech commands. A user study with 20 participants (10 participants for each language) showed that SpeeChin could recognize 54 English and 44 Chinese silent speech commands with average cross-session accuracies of 90.5% and 91.6%, respectively. To further investigate the potential of SpeeChin in recognizing other silent speech commands, we conducted another study with 10 participants distinguishing between 72 one-syllable nonwords. Based on the results from the user studies, we further discuss the challenges and opportunities of deploying SpeeChin in real-world applications.

A Facial Electromyography Activity Detection Method in Silent Speech Recognition

10.1109/hpbdis53214.2021.9658469 ◽

2021 ◽

Author(s):

Huihui Cai ◽

Yakun Zhang ◽

Liang Xie ◽

Huijiong Yan ◽

Wei Qin ◽

...

Keyword(s):

Speech Recognition ◽

Detection Method ◽

Activity Detection ◽

Facial Electromyography ◽

Silent Speech

Representation Learning of Tongue Dynamics for a Silent Speech Interface

IEICE Transactions on Information and Systems ◽

10.1587/transinf.2021edp7090 ◽

2021 ◽

Vol E104.D (12) ◽

pp. 2209-2217

Author(s):

Hongcui WANG ◽

Pierre ROUSSEL ◽

Bruce DENBY

Keyword(s):

Representation Learning ◽

Speech Interface ◽

Silent Speech

Silent Speech Command Word Recognition Using Stepped Frequency Continuous Wave Radar

10.21203/rs.3.rs-1092137/v1 ◽

2021 ◽

Author(s):

Christoph Wagner ◽

Petr Schaffer ◽

Pouriya Amini Digehsara ◽

Michael Bärhold ◽

Dirk Plettemeier ◽

...

Keyword(s):

Continuous Wave ◽

Short Term Memory ◽

Oral Communication ◽

Transmission Spectra ◽

Non Invasive ◽

Wave Radar ◽

Acoustic Speech Signal ◽

Measuring Frequency ◽

Stepped Frequency ◽

Silent Speech

Abstract Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. Radar is a relatively unexplored silent speech sensing modality, even though it has the advantage of being fully non-invasive. We therefore built a custom stepped frequency continuous wave radar hardware to measure the changes in the transmission spectra during speech between three antennas, located on both cheeks and the chin with a measuring frequency of 100 Hz. We then recorded a command word corpus of 40 phonetically balanced, two-syllable German words and the German digits zero to nine for two individual speakers and evaluated both the speaker-dependent multi-session and inter-session recognition accuracies on this 50-word corpus using a bidirectional long-short term memory network. We obtained recognition accuracies of 99.17 % and 88.87 % for the speaker-dependent multi-session and inter-session accuracy, respectively. These results show that the transmission spectra are very well suited to discriminate individual words from one another, even across different sessions, which is one of the key challenges for fully non-invasive silent speech interfaces.

The Effects of Classification Method and Electrode Configuration on EEG-based Silent Speech Classification

10.1109/embc46164.2021.9629709 ◽

2021 ◽

Author(s):

Changjie Pan ◽

Ying-Hui Lai ◽

Fei Chen

Keyword(s):

Electrode Configuration ◽

Classification Method ◽

Speech Classification ◽

Silent Speech

Parallel-Inception CNN Approach for Facial sEMG based Silent Speech Recognition

10.1109/embc46164.2021.9630373 ◽

2021 ◽

Author(s):

Jinghan Wu ◽

Tao Zhao ◽

Yakun Zhang ◽

Liang Xie ◽

Ye Yan ◽

...

Keyword(s):

Speech Recognition ◽

Silent Speech

silent speech
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Ultrasonic Doppler Based Silent Speech Interface Using Perceptual Distance

Exploring Silent Speech Interfaces Based on Frequency-Modulated Continuous-Wave Radar

Slow Firing Single Units Are Essential for Optimal Decoding of Silent Speech

Machine Learning Methods for Automatic Silent Speech Recognition Using a Wearable Graphene Strain Gauge Sensor

SpeeChin

A Facial Electromyography Activity Detection Method in Silent Speech Recognition

Representation Learning of Tongue Dynamics for a Silent Speech Interface

Silent Speech Command Word Recognition Using Stepped Frequency Continuous Wave Radar

The Effects of Classification Method and Electrode Configuration on EEG-based Silent Speech Classification

Parallel-Inception CNN Approach for Facial sEMG based Silent Speech Recognition

Export Citation Format

silent speechRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Ultrasonic Doppler Based Silent Speech Interface Using Perceptual Distance

Exploring Silent Speech Interfaces Based on Frequency-Modulated Continuous-Wave Radar

Slow Firing Single Units Are Essential for Optimal Decoding of Silent Speech

Machine Learning Methods for Automatic Silent Speech Recognition Using a Wearable Graphene Strain Gauge Sensor

SpeeChin

A Facial Electromyography Activity Detection Method in Silent Speech Recognition

Representation Learning of Tongue Dynamics for a Silent Speech Interface

Silent Speech Command Word Recognition Using Stepped Frequency Continuous Wave Radar

The Effects of Classification Method and Electrode Configuration on EEG-based Silent Speech Classification

Parallel-Inception CNN Approach for Facial sEMG based Silent Speech Recognition

silent speech
Recently Published Documents