silent speech
Recently Published Documents


TOTAL DOCUMENTS

200
(FIVE YEARS 74)

H-INDEX

17
(FIVE YEARS 3)

2022 ◽  
Vol 12 (2) ◽  
pp. 827
Author(s):  
Ki-Seung Lee

Moderate performance in terms of intelligibility and naturalness can be obtained using previously established silent speech interface (SSI) methods. Nevertheless, a common problem associated with SSI has involved deficiencies in estimating the spectrum details, which results in synthesized speech signals that are rough, harsh, and unclear. In this study, harmonic enhancement (HE), was used during postprocessing to alleviate this problem by emphasizing the spectral fine structure of speech signals. To improve the subjective quality of synthesized speech, the difference between synthesized and actual speech was established by calculating the distance in the perceptual domains instead of using the conventional mean square error (MSE). Two deep neural networks (DNNs) were employed to separately estimate the speech spectra and the filter coefficients of HE, connected in a cascading manner. The DNNs were trained to incrementally and iteratively minimize both the MSE and the perceptual distance (PD). A feasibility test showed that the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility measure (STOI) were improved by 17.8 and 2.9%, respectively, compared with previous methods. Subjective listening tests revealed that the proposed method yielded perceptually preferred results compared with that of the conventional MSE-based method.


Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 649
Author(s):  
David Ferreira ◽  
Samuel Silva ◽  
Francisco Curado ◽  
António Teixeira

Speech is our most natural and efficient form of communication and offers a strong potential to improve how we interact with machines. However, speech communication can sometimes be limited by environmental (e.g., ambient noise), contextual (e.g., need for privacy), or health conditions (e.g., laryngectomy), preventing the consideration of audible speech. In this regard, silent speech interfaces (SSI) have been proposed as an alternative, considering technologies that do not require the production of acoustic signals (e.g., electromyography and video). Unfortunately, despite their plentitude, many still face limitations regarding their everyday use, e.g., being intrusive, non-portable, or raising technical (e.g., lighting conditions for video) or privacy concerns. In line with this necessity, this article explores the consideration of contactless continuous-wave radar to assess its potential for SSI development. A corpus of 13 European Portuguese words was acquired for four speakers and three of them enrolled in a second acquisition session, three months later. Regarding the speaker-dependent models, trained and tested with data from each speaker while using 5-fold cross-validation, average accuracies of 84.50% and 88.00% were respectively obtained from Bagging (BAG) and Linear Regression (LR) classifiers, respectively. Additionally, recognition accuracies of 81.79% and 81.80% were also, respectively, achieved for the session and speaker-independent experiments, establishing promising grounds for further exploring this technology towards silent speech recognition.


2022 ◽  
Author(s):  
Philip Kennedy ◽  
A. Ganesh ◽  
A.J. Cervantes

Abstract Summary The motivation of someone who is locked-in, that is, paralyzed and mute, is to find relief for their loss of function. The data presented in this report is part of an attempt to restore one of those lost functions, namely, speech. An essential feature of the development of a speech prosthetic is optimal decoding of patterns of recorded neural signals during silent or covert speech, that is, speaking ‘inside the head’ with no audible output due to the paralysis of the articulators. The aim of this paper is to illustrate the importance of both fast and slow single unit firings recorded from an individual with locked-in syndrome and from an intact participant speaking silently. Long duration electrodes were implanted in the motor speech cortex for up to 13 years in the locked-in participant. The data herein provide evidence that slow firing single units are essential for optimal decoding accuracy. Additional evidence indicates that slow firing single units can be conditioned in the locked-in participant five years after implantation, further supporting their role in decoding.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 299
Author(s):  
Dafydd Ravenscroft ◽  
Ioannis Prattis ◽  
Tharun Kandukuri ◽  
Yarjan Abdul Samad ◽  
Giorgio Mallia ◽  
...  

Silent speech recognition is the ability to recognise intended speech without audio information. Useful applications can be found in situations where sound waves are not produced or cannot be heard. Examples include speakers with physical voice impairments or environments in which audio transference is not reliable or secure. Developing a device which can detect non-auditory signals and map them to intended phonation could be used to develop a device to assist in such situations. In this work, we propose a graphene-based strain gauge sensor which can be worn on the throat and detect small muscle movements and vibrations. Machine learning algorithms then decode the non-audio signals and create a prediction on intended speech. The proposed strain gauge sensor is highly wearable, utilising graphene’s unique and beneficial properties including strength, flexibility and high conductivity. A highly flexible and wearable sensor able to pick up small throat movements is fabricated by screen printing graphene onto lycra fabric. A framework for interpreting this information is proposed which explores the use of several machine learning techniques to predict intended words from the signals. A dataset of 15 unique words and four movements, each with 20 repetitions, was developed and used for the training of the machine learning algorithms. The results demonstrate the ability for such sensors to be able to predict spoken words. We produced a word accuracy rate of 55% on the word dataset and 85% on the movements dataset. This work demonstrates a proof-of-concept for the viability of combining a highly wearable graphene strain gauge and machine leaning methods to automate silent speech recognition.


Author(s):  
Ruidong Zhang ◽  
Mingyang Chen ◽  
Benjamin Steeper ◽  
Yaxuan Li ◽  
Zihan Yan ◽  
...  

This paper presents SpeeChin, a smart necklace that can recognize 54 English and 44 Chinese silent speech commands. A customized infrared (IR) imaging system is mounted on a necklace to capture images of the neck and face from under the chin. These images are first pre-processed and then deep learned by an end-to-end deep convolutional-recurrent-neural-network (CRNN) model to infer different silent speech commands. A user study with 20 participants (10 participants for each language) showed that SpeeChin could recognize 54 English and 44 Chinese silent speech commands with average cross-session accuracies of 90.5% and 91.6%, respectively. To further investigate the potential of SpeeChin in recognizing other silent speech commands, we conducted another study with 10 participants distinguishing between 72 one-syllable nonwords. Based on the results from the user studies, we further discuss the challenges and opportunities of deploying SpeeChin in real-world applications.


Author(s):  
Huihui Cai ◽  
Yakun Zhang ◽  
Liang Xie ◽  
Huijiong Yan ◽  
Wei Qin ◽  
...  

2021 ◽  
Vol E104.D (12) ◽  
pp. 2209-2217
Author(s):  
Hongcui WANG ◽  
Pierre ROUSSEL ◽  
Bruce DENBY

2021 ◽  
Author(s):  
Christoph Wagner ◽  
Petr Schaffer ◽  
Pouriya Amini Digehsara ◽  
Michael Bärhold ◽  
Dirk Plettemeier ◽  
...  

Abstract Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. Radar is a relatively unexplored silent speech sensing modality, even though it has the advantage of being fully non-invasive. We therefore built a custom stepped frequency continuous wave radar hardware to measure the changes in the transmission spectra during speech between three antennas, located on both cheeks and the chin with a measuring frequency of 100 Hz. We then recorded a command word corpus of 40 phonetically balanced, two-syllable German words and the German digits zero to nine for two individual speakers and evaluated both the speaker-dependent multi-session and inter-session recognition accuracies on this 50-word corpus using a bidirectional long-short term memory network. We obtained recognition accuracies of 99.17 % and 88.87 % for the speaker-dependent multi-session and inter-session accuracy, respectively. These results show that the transmission spectra are very well suited to discriminate individual words from one another, even across different sessions, which is one of the key challenges for fully non-invasive silent speech interfaces.


2021 ◽  
Author(s):  
Jinghan Wu ◽  
Tao Zhao ◽  
Yakun Zhang ◽  
Liang Xie ◽  
Ye Yan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document