scholarly journals Speech-to-touch sensory substitution: a 10-decibel improvement in speech-in-noise understanding after a short training

Author(s):  
Katarzyna Ciesla ◽  
T. Wolak ◽  
A. Lorens ◽  
H. Skarżyński ◽  
A. Amedi

Abstract Understanding speech in background noise is challenging. Wearing face-masks during COVID19-pandemics made it even harder. We developed a multi-sensory setup, including a sensory substitution device (SSD) that can deliver speech simultaneously through audition and as vibrations on fingertips. After a short training session, participants significantly improved (16 out of 17) in speech-in-noise understanding, when added vibrations corresponded to low-frequencies extracted from the sentence. The level of understanding was maintained after training, when the loudness of the background noise doubled (mean group improvement of ~ 10 decibels). This result indicates that our solution can be very useful for the hearing-impaired patients. Even more interestingly, the improvement was transferred to a post-training situation when the touch input was removed, showing that we can apply the setup for auditory rehabilitation in cochlear implant-users. Future wearable implementations of our SSD can also be used in real-life situations, when talking on the phone or learning a foreign language. We discuss the basic science implications of our findings, such as we show that even in adulthood a new pairing can be established between a neuronal computation (speech processing) and an atypical sensory modality (tactile). Speech is indeed a multisensory signal, but learned from birth in an audio-visual context. Interestingly, adding lip reading cues to speech in noise provides benefit of the same or lower magnitude as we report here for adding touch.

2019 ◽  
Vol 32 (2) ◽  
pp. 87-109 ◽  
Author(s):  
Galit Buchs ◽  
Benedetta Heimler ◽  
Amir Amedi

Abstract Visual-to-auditory Sensory Substitution Devices (SSDs) are a family of non-invasive devices for visual rehabilitation aiming at conveying whole-scene visual information through the intact auditory modality. Although proven effective in lab environments, the use of SSDs has yet to be systematically tested in real-life situations. To start filling this gap, in the present work we tested the ability of expert SSD users to filter out irrelevant background noise while focusing on the relevant audio information. Specifically, nine blind expert users of the EyeMusic visual-to-auditory SSD performed a series of identification tasks via SSDs (i.e., shape, color, and conjunction of the two features). Their performance was compared in two separate conditions: silent baseline, and with irrelevant background sounds from real-life situations, using the same stimuli in a pseudo-random balanced design. Although the participants described the background noise as disturbing, no significant performance differences emerged between the two conditions (i.e., noisy; silent) for any of the tasks. In the conjunction task (shape and color) we found a non-significant trend for a disturbing effect of the background noise on performance. These findings suggest that visual-to-auditory SSDs can indeed be successfully used in noisy environments and that users can still focus on relevant auditory information while inhibiting irrelevant sounds. Our findings take a step towards the actual use of SSDs in real-life situations while potentially impacting rehabilitation of sensory deprived individuals.


BJS Open ◽  
2021 ◽  
Vol 5 (Supplement_1) ◽  
Author(s):  
Alex Tebbett ◽  
Ian Purcell ◽  
Shereen Watton ◽  
Rathinavel Shanmugham ◽  
Alexandra Tebbett

Abstract Introduction During Covid-19 many staff members were redeployed to the Intensive Care Unit (ICU) with little opportunity to train in the new skills they would require. One such skill was the transfer of a critically ill, and contagious, patient from ICU; a risky and complicated procedure which requires planning, preparation, risk assessment, situational awareness and, ideally, experience. To assist our colleagues in this skill an existing ICU transfer course has been adapted to cover the Covid-19 situation, or any similar contagious pandemic, in patient transfer. Methods An in-situ simulation method was chosen as the most realistic method of immersing our participants into the environment of ICU and to highlight real-life complexities and issues they may face. A multidisciplinary training session was devised so that novice anaesthetists, ACCPs and nurses could learn together, reflective of the usual team. Human factors such as communication, team leadership, task management and situational awareness are the focus of the post-simulation debrief, and human factors sheets have been created to guide the participants in analysing these skills. Pre- and post-simulation confidence, knowledge and attitudes will be assessed using validated appraisal tools and questionnaires to gather both quantitative and qualitative data about the experience. Discussion Multidisciplinary training is often difficult to arrange, due to the different requirements, processes, and procedures each department demands. A hidden blessing of Covid-19 is the realisation that this barrier can be broken, for the benefit of our patients and colleagues alike, and training sessions like this implemented.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jacques Pesnot Lerousseau ◽  
Gabriel Arnold ◽  
Malika Auvray

AbstractSensory substitution devices aim at restoring visual functions by converting visual information into auditory or tactile stimuli. Although these devices show promise in the range of behavioral abilities they allow, the processes underlying their use remain underspecified. In particular, while an initial debate focused on the visual versus auditory or tactile nature of sensory substitution, since over a decade, the idea that it reflects a mixture of both has emerged. In order to investigate behaviorally the extent to which visual and auditory processes are involved, participants completed a Stroop-like crossmodal interference paradigm before and after being trained with a conversion device which translates visual images into sounds. In addition, participants' auditory abilities and their phenomenologies were measured. Our study revealed that, after training, when asked to identify sounds, processes shared with vision were involved, as participants’ performance in sound identification was influenced by the simultaneously presented visual distractors. In addition, participants’ performance during training and their associated phenomenology depended on their auditory abilities, revealing that processing finds its roots in the input sensory modality. Our results pave the way for improving the design and learning of these devices by taking into account inter-individual differences in auditory and visual perceptual strategies.


2021 ◽  
Vol 28 (Supplement_1) ◽  
Author(s):  
J Brito ◽  
I Aguiar-Ricardo ◽  
P Alves Da Silva ◽  
B Valente Da Silva ◽  
N Cunha ◽  
...  

Abstract Funding Acknowledgements Type of funding sources: None. Introduction Despite the established benefits of cardiac rehabilitation (CR), it remains significantly underutilized. Home-based CR (CR-HB) programs should offer the same core CR components as Centre-based programs (CR-CB) but several aspects need to be adapted, communication and supervision must be improved. Although CR-HB has been successfully deployed and is a valuable alternative to CR-CB, there is less structured experience with these non-uniform programs and further studies are needed to understand which patients (pts) are indicated to this type of program. Purpose To investigate pt-perceived facilitators and barriers to home-based rehabilitation exercise. Methods Prospective cohort study which included pts who were participating in a CR-CB program and accepted to participate in a CR-HB program after CR-CB closure due to COVID-19. The CR-HB consisted in a multidisciplinary digital CR program, including pt risk evaluation and regular assessment, exercise, educational and psychological sessions. The online exercise training sessions consisted of recorded videos and real time online supervised exercise training group sessions. It was recommended to do each session 3 times per week, during 60 min. A pictorial exercise training guidebook was available to all participants including instructions regarding safety, clothing and warm-up, and a detailed illustrated description of each  exercise sessions. Also, for questions or difficulties regarding the exercises, an e-mail and telephone was provided. Once a month, real time CR exercise sessions was provided with a duration of 60min. Results 116 cardiovascular disease pts (62.6 ± 8.9years, 95 males) who were attending a face-to-face CR program were included in a CR-HB program. The majority of the pts had coronary artery disease (89%) and 5% valvular disease. Regarding risk factors, obesity was the most common (75%) followed by hypertension (60%), family history (42%), dyslipidaemia (38%), diabetes (18%), and smoking (13%). Almost half (47%) of the participants did at least one online exercise training session per week: 58% did 2-3 times per week, 27% once per week and 15% more than 4 times per week. Participants who did less than one exercise session per week reported as cause: lack of motivation (38%), preference of a different mode of exercise training such as exercise in the exterior space (26%), technology barrier such as impossibility to stream online videos (11%), fear of performing exercise without supervision (4%), and limited space at home (4%). Conclusions Our study based on real-life results of a CR-HB program shows a sub-optimal rate of participation in exercise sessions due to different causes, but mainly for the lack of motivation to exercise alone or preference for walking in exterior space. The knowledge of the CR-HB program barriers will facilitate to find out strategies to increase the participation rate and to select the best candidates for this type of programs.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 92-92
Author(s):  
Jennie L Ivey ◽  
Lew G Strickland ◽  
Justin D Rhinehart

Abstract Developing livestock and equine trainings to empower county Extension agents is challenging, especially when spanning in-person and online delivery modules. Real life application of training concepts is difficult, particularly when participants have varied backgrounds and experience. Thus, we assessed if scenario-based training modules were an effective training method across in-person and virtual formats. The same scenario-based training was delivered at three, regional in-person trainings (n = 42), and one virtual training (n = 32). Training format consisted of four, species-specific lectures addressing various production topics. Small groups then developed recommendations for a specific scenario, followed by a debriefing session consisting of group reactions and specialist recommendations. Topic-area application to county programs, instructor effectiveness, and overall benefit of the training session were evaluated (Qualtrics, in-person n = 26, 62% completion; virtual n = 17, 53% completion). Data were assessed using analysis of variance and mean comparisons (α=0.05), with Tukey’s pairwise post hoc analysis where appropriate (STATA 16). Across all sessions, likert scale responses (1=poor and 5=excellent, n = 43) indicated lecture sessions were applicable to county areas of need across material content (mean±SD, cattle=4.71±0.57, equine=4.64±0.50), teaching effectiveness (cattle=4.77±0.42, equine=4.75±0.43), and overall quality (cattle=4.68±0.57, equine=4.67±0.51), respectively. Scenario-based training benefit was not influenced by the number of times an agent had attended in-service training on livestock species, agent appointment (youth vs. adult educator), or training location (p >0.05). Attendance at previous in-service trainings (cattle P = 0.005; equine P = 0.013) and agent appointment (cattle P = 0.0006; equine P = 0.05) had a significant impact on the number of questions agents reported to have received on scenario topics in the last 12 months. More topic area questions were reported by agents with adult education responsibilities and previous training attendance. Based upon these results, scenario-based training is an effective in-person and virtual training tool for 4-H and adult Extension agents of varying experience.


2020 ◽  
Vol 14 ◽  
Author(s):  
Stephanie Haro ◽  
Christopher J. Smalt ◽  
Gregory A. Ciccarelli ◽  
Thomas F. Quatieri

Many individuals struggle to understand speech in listening scenarios that include reverberation and background noise. An individual's ability to understand speech arises from a combination of peripheral auditory function, central auditory function, and general cognitive abilities. The interaction of these factors complicates the prescription of treatment or therapy to improve hearing function. Damage to the auditory periphery can be studied in animals; however, this method alone is not enough to understand the impact of hearing loss on speech perception. Computational auditory models bridge the gap between animal studies and human speech perception. Perturbations to the modeled auditory systems can permit mechanism-based investigations into observed human behavior. In this study, we propose a computational model that accounts for the complex interactions between different hearing damage mechanisms and simulates human speech-in-noise perception. The model performs a digit classification task as a human would, with only acoustic sound pressure as input. Thus, we can use the model's performance as a proxy for human performance. This two-stage model consists of a biophysical cochlear-nerve spike generator followed by a deep neural network (DNN) classifier. We hypothesize that sudden damage to the periphery affects speech perception and that central nervous system adaptation over time may compensate for peripheral hearing damage. Our model achieved human-like performance across signal-to-noise ratios (SNRs) under normal-hearing (NH) cochlear settings, achieving 50% digit recognition accuracy at −20.7 dB SNR. Results were comparable to eight NH participants on the same task who achieved 50% behavioral performance at −22 dB SNR. We also simulated medial olivocochlear reflex (MOCR) and auditory nerve fiber (ANF) loss, which worsened digit-recognition accuracy at lower SNRs compared to higher SNRs. Our simulated performance following ANF loss is consistent with the hypothesis that cochlear synaptopathy impacts communication in background noise more so than in quiet. Following the insult of various cochlear degradations, we implemented extreme and conservative adaptation through the DNN. At the lowest SNRs (<0 dB), both adapted models were unable to fully recover NH performance, even with hundreds of thousands of training samples. This implies a limit on performance recovery following peripheral damage in our human-inspired DNN architecture.


2019 ◽  
Author(s):  
Shyanthony R. Synigal ◽  
Emily S. Teoh ◽  
Edmund C. Lalor

ABSTRACTThe human auditory system is adept at extracting information from speech in both single-speaker and multi-speaker situations. This involves neural processing at the rapid temporal scales seen in natural speech. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) signatures of such processing have shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such rapid processing is even more strongly reflected in the power of neural activity at high frequencies (around 70-150 Hz; known as high gamma). The aim of this study was to determine if high gamma power in scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Furthermore, we aimed to assess whether any such information might be complementary to that reflected in well-established low frequency EEG indices of speech processing. We used linear regression to investigate speech envelope and attention decoding in EEG at low frequencies, in high gamma power, and in both signals combined. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in a minority of subjects. This same pattern was true for attention decoding using a separate group of subjects who undertook a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Overall, this indicates that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects and combining it with low frequency EEG can improve the mapping between natural speech and the resulting neural responses.


1992 ◽  
Vol 2 (3) ◽  
pp. 181-191
Author(s):  
Hans Peter Zenner ◽  
Günter Reuter ◽  
Shi Hong ◽  
Ulrike Zimmermann ◽  
Alfred H. Gitter

Vestibular hair cells, type I and II, with membrane potentials around -64 mV were prepared from guinea pig ampullar cristae and maculae. In type I cells, current injection, application of voltage steps during membrane patch-clamping, or extracellular alternating current (ac) fields evoked fast length changes of 50 nm to 500 nm of the cell “neck”. Mechanical responses were determined by computerized video techniques with contrast-enhanced digital image subtraction (DIS) and interpeak pixel counts (IPPC) or by double photodiode measurements. These techniques allowed spatial resolutions of 300 nm, 120 nm, and 50 nm, respectively. In contrast to measurements of high-frequency movements of auditory outer hair cells (OHCs), the mechanical responses of type I VHCs were restricted to low frequencies below 85 Hz. In addition to recently reported slow motility of VHCs, the present results suggest that fast mechanical VHC responses could significantly influence macular and cupular mechanics. Isometric and isotonic variants are discussed. The observed frequency maxima gap between VHCs and OHCs is suggested to contribute to a clear separation of the auditory and the vestibular sensory modality.


2019 ◽  
Vol 23 ◽  
pp. 233121651984829 ◽  
Author(s):  
Ghada BinKhamis ◽  
Antonio Elia Forte ◽  
Tobias Reichenbach ◽  
Martin O’Driscoll ◽  
Karolina Kluk

Evaluation of patients who are unable to provide behavioral responses on standard clinical measures is challenging due to the lack of standard objective (non-behavioral) clinical audiological measures that assess the outcome of an intervention (e.g., hearing aids). Brainstem responses to short consonant-vowel stimuli (speech-auditory brainstem responses [speech-ABRs]) have been proposed as a measure of subcortical encoding of speech, speech detection, and speech-in-noise performance in individuals with normal hearing. Here, we investigated the potential application of speech-ABRs as an objective clinical outcome measure of speech detection, speech-in-noise detection and recognition, and self-reported speech understanding in 98 adults with sensorineural hearing loss. We compared aided and unaided speech-ABRs, and speech-ABRs in quiet and in noise. In addition, we evaluated whether speech-ABR F0 encoding (obtained from the complex cross-correlation with the 40 ms [da] fundamental waveform) predicted aided behavioral speech recognition in noise or aided self-reported speech understanding. Results showed that (a) aided speech-ABRs had earlier peak latencies, larger peak amplitudes, and larger F0 encoding amplitudes compared to unaided speech-ABRs; (b) the addition of background noise resulted in later F0 encoding latencies but did not have an effect on peak latencies and amplitudes or on F0 encoding amplitudes; and (c) speech-ABRs were not a significant predictor of any of the behavioral or self-report measures. These results show that speech-ABR F0 encoding is not a good predictor of speech-in-noise recognition or self-reported speech understanding with hearing aids. However, our results suggest that speech-ABRs may have potential for clinical application as an objective measure of speech detection with hearing aids.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Raphaël Thézé ◽  
Mehdi Ali Gadiri ◽  
Louis Albert ◽  
Antoine Provost ◽  
Anne-Lise Giraud ◽  
...  

Abstract Natural speech is processed in the brain as a mixture of auditory and visual features. An example of the importance of visual speech is the McGurk effect and related perceptual illusions that result from mismatching auditory and visual syllables. Although the McGurk effect has widely been applied to the exploration of audio-visual speech processing, it relies on isolated syllables, which severely limits the conclusions that can be drawn from the paradigm. In addition, the extreme variability and the quality of the stimuli usually employed prevents comparability across studies. To overcome these limitations, we present an innovative methodology using 3D virtual characters with realistic lip movements synchronized on computer-synthesized speech. We used commercially accessible and affordable tools to facilitate reproducibility and comparability, and the set-up was validated on 24 participants performing a perception task. Within complete and meaningful French sentences, we paired a labiodental fricative viseme (i.e. /v/) with a bilabial occlusive phoneme (i.e. /b/). This audiovisual mismatch is known to induce the illusion of hearing /v/ in a proportion of trials. We tested the rate of the illusion while varying the magnitude of background noise and audiovisual lag. Overall, the effect was observed in 40% of trials. The proportion rose to about 50% with added background noise and up to 66% when controlling for phonetic features. Our results conclusively demonstrate that computer-generated speech stimuli are judicious, and that they can supplement natural speech with higher control over stimulus timing and content.


Sign in / Sign up

Export Citation Format

Share Document