speech distortion
Recently Published Documents


TOTAL DOCUMENTS

75
(FIVE YEARS 14)

H-INDEX

13
(FIVE YEARS 1)

2021 ◽  
Vol 8 ◽  
Author(s):  
Yuyong Kang ◽  
Nengheng Zheng ◽  
Qinglin Meng

The cochlea plays a key role in the transmission from acoustic vibration to neural stimulation upon which the brain perceives the sound. A cochlear implant (CI) is an auditory prosthesis to replace the damaged cochlear hair cells to achieve acoustic-to-neural conversion. However, the CI is a very coarse bionic imitation of the normal cochlea. The highly resolved time-frequency-intensity information transmitted by the normal cochlea, which is vital to high-quality auditory perception such as speech perception in challenging environments, cannot be guaranteed by CIs. Although CI recipients with state-of-the-art commercial CI devices achieve good speech perception in quiet backgrounds, they usually suffer from poor speech perception in noisy environments. Therefore, noise suppression or speech enhancement (SE) is one of the most important technologies for CI. In this study, we introduce recent progress in deep learning (DL), mostly neural networks (NN)-based SE front ends to CI, and discuss how the hearing properties of the CI recipients could be utilized to optimize the DL-based SE. In particular, different loss functions are introduced to supervise the NN training, and a set of objective and subjective experiments is presented. Results verify that the CI recipients are more sensitive to the residual noise than the SE-induced speech distortion, which has been common knowledge in CI research. Furthermore, speech reception threshold (SRT) in noise tests demonstrates that the intelligibility of the denoised speech can be significantly improved when the NN is trained with a loss function bias to more noise suppression than that with equal attention on noise residue and speech distortion.


Virittäjä ◽  
2021 ◽  
Vol 125 (3) ◽  
Author(s):  
Taina Pitkänen ◽  
Maija Tervola ◽  
Merja Toivonen ◽  
Elise Kosunen

Artikkeli käsittelee suomea toisena kielenä käyttävien lääkärien kuullun ymmärtämistä potilasvastaanottotilanteessa. Tutkimuksessa analysoidaan kielellisten merkitysten välittymistä ja tehdään päätelmiä lääkärien kyvystä ymmärtää potilaiden puhetta. Tutkimusaineistona on EU/ETA-maiden ulkopuolella tutkintonsa suorittaneiden lääkäreiden laillistamisprosessiin kuuluvia potilastenttejä eli videoituja potilas­vastaanottoja. Aineisto koostuu 30 lääkärin tenttisuorituksista, yhteensä 87 potilasvastaanotosta (n. 40 tuntia). Videotallenteiden lisäksi aineisto sisältää kustakin vastaanottotilanteesta laaditut potilaskertomusmerkinnät sekä tenttiä valvovan lääkärin muistion. Analyysissa etsittiin lääkärin ja potilaan välisestä keskustelusta kohtia, joissa potilaan kertoma informaatio ei välity lääkärille lainkaan tai se välittyy väärin. Keskusteluaineistoa, potilaskertomusmerkintöjä ja arviointimuistioita yhdistävän aineistotriangulaation avulla pyrittiin saamaan näkyviin myös sellaisia ymmärtämisen ongelmia, jotka eivät näy keskustelun pinnassa. Tutkimuksesta ilmenee, että toistuvia merkitysten välittymisen ongelmia on muutamilla lääkäreillä. Noin kolmanneksella ongelmia ei ole juuri lainkaan, ja hieman yli kolmanneksella ongelmat ovat satunnaisia. Ongelmat näkyvät potilaan puheen semanttisesta sisällöstä poikkeavina tulkintoina (informaation vääristyminen) sekä potilaan puheen sisällön jäämisenä kokonaan huomiotta (informaation kato). Osa ongelmakohdista on yhdistettävissä yksittäiseen sanaan tai ilmaukseen; osa liittyy puhekielen syntaktiseen rakenteeseen tai epäsuoriin ilmauksiin. Osassa taustalla näyttää olevan laajemman tulkintakehyksen, esimerkiksi vastanoton rakenteen, perusteella tehtyjä virhetulkintoja. Kuullun ymmärtämisen näkökulmasta tutkimus nostaa esiin kaksi keskeistä haastetta suomea toisena kielenä käyttävälle lääkärille: potilaan puheen kielellisen variaation ja puheenaiheiden ennakoimattomuuden. Lääkäri ja potilas käyttävät osin eri sanastoa, ja lääkärillä voi olla vaikeuksia ymmärtää esimerkiksi oireiden kuvausta muilla kuin ammattikielen sanoilla. Puheenaiheen ennakoitavuus helpottaa ymmärtämistä, kun taas yllättävässä kohdassa esitetyt uudet puheenaiheet saattavat jäädä ymmärtämättä.   Problems in the communication of linguistic meaning in clinical skills tests of migrant physicians This article considers the listening comprehension skills in clinical skills tests taken by physicians who use Finnish as a second language. The focus is on communicating linguistic meanings. The aim is to present conclusions about the physicians’ ability to understand patients’ speech during their consultations. The research data consists of clinical skills tests for the licensing process of physicians who completed their degrees outside the EEA, i.e. video-recorded patient consultations. The data consists of 30 test results, a total of 87 patient consultations (approx. 40 hours). In addition to video recordings, the data includes the patient records of each consultation and reports by the physician supervising the test. The analysis examined situations in doctor-patient consultations where the information provided by the patient was either not communicated to the doctor at all or was miscommunicated. Data triangulation linking the video material, patient records and test evaluation records was used to reveal comprehension problems that remain hidden under the surface of the consultation. The study showed that problems in communicating linguistic meanings were manifested both as associations diverging from the semantic content of the patient’s speech (distortion of information) and as disregard for the patient’s speech (loss of information). Some problems may be associated with a single word or phrase, some with a syntactic structure or the indirect expressions of colloquial language. Some appeared to consist of misinterpretations based on a broader interpretation framework, such as the structure of the consultation. From the perspective of listening comprehension, the study highlighted two key challenges: the linguistic variation of the patient’s speech and the unpredictability of the topics of discussion. Physicians and patients use a different vocabulary, and a physician may have difficulty understanding matters such as symptoms when described in non-professional terms. Comprehension is facilitated by the predictability of discussion topics, whereas new topics arising at an unexpected point can lead to misunderstandings.    


Author(s):  
Hillary Lathrop-Marshall ◽  
Mary Morgan B Keyser ◽  
Samantha Jhingree ◽  
Natalie Giduz ◽  
Clare Bocklage ◽  
...  

Summary Introduction Patients with dentofacial disharmonies (DFDs) seek orthodontic care and orthognathic surgery to address issues with mastication, esthetics, and speech. Speech distortions are seen 18 times more frequently in Class III DFD patients than the general population, with unclear causality. We hypothesize there are significant differences in spectral properties of stop (/t/ or /k/), fricative (/s/ or /ʃ/), and affricate (/tʃ/) consonants and that severity of Class III disharmony correlates with the degree of speech abnormality. Methods To understand how jaw disharmonies influence speech, orthodontic records and audio recordings were collected from Class III surgical candidates and reference subjects (n = 102 Class III, 62 controls). A speech pathologist evaluated subjects and recordings were quantitatively analysed by Spectral Moment Analysis for frequency distortions. Results A majority of Class III subjects exhibit speech distortions. A significant increase in the centroid frequency (M1) and spectral spread (M2) was seen in several consonants of Class III subjects compared to controls. Using regression analysis, correlations between Class III skeletal severity (assessed by cephalometric measures) and spectral distortion were found for /t/ and /k/ phones. Conclusions Class III DFD patients have a higher prevalence of articulation errors and significant spectral distortions in consonants relative to controls. This is the first demonstration that severity of malocclusion is quantitatively correlated with the degree of speech distortion for consonants, suggesting causation. These findings offer insight into the complex relationship between craniofacial structures and speech distortions.


2021 ◽  
Vol 263 (3) ◽  
pp. 3584-3594
Author(s):  
Yameizhen Li ◽  
Benjamin Yen ◽  
Yusuke Hioka

Recording speech from unmanned aerial vehicles has been attracting interest due to its broad application including filming, search and rescue, and surveillance. One of the challenges in this problem is the quality of the speech recorded due to contamination by various interfering noise. In particular, noise contamination due to those radiated by the unmanned aerial vehicles rotors significantly impacts the overall quality of the audio recordings. Multi-channel Wiener filter has been a commonly used technique for speech enhancement because of its robustness under practical setup. Existing studies have also utilised such techniques in speech enhancement for unmanned aerial vehicle recordings, such as the well-known beamformer with postfiltering framework. However, many variants of the multi-channel Wiener filter have also been developed over recent years such as the speech distortion weighted multi-channel Wiener filter. To address these recent advancements, in this study we compare the performance of these variants of techniques. In particular, we explore the benefits these techniques may bring forth in the setting of audio recordings from an unmanned aerial vehicle.


Author(s):  
Randall Ali ◽  
Toon van Waterschoot ◽  
Marc Moonen

AbstractAn integrated version of the minimum variance distortionless response (MVDR) beamformer for speech enhancement using a microphone array has been recently developed, which merges the benefits of imposing constraints defined from both a relative transfer function (RTF) vector based on a priori knowledge and an RTF vector based on a data-dependent estimate. In this paper, the integrated MVDR beamformer is extended for use with a microphone configuration where a microphone array, local to a speech processing device, has access to the signals from multiple external microphones (XMs) randomly located in the acoustic environment. The integrated MVDR beamformer is reformulated as a quadratically constrained quadratic program (QCQP) with two constraints, one of which is related to the maximum tolerable speech distortion for the imposition of the a priori RTF vector and the other related to the maximum tolerable speech distortion for the imposition of the data-dependent RTF vector. An analysis of how these maximum tolerable speech distortions affect the behaviour of the QCQP is presented, followed by the discussion of a general tuning framework. The integrated MVDR beamformer is then evaluated with audio recordings from behind-the-ear hearing aid microphones and three XMs for a single desired speech source in a noisy environment. In comparison to relying solely on an a priori RTF vector or a data-dependent RTF vector, the results demonstrate that the integrated MVDR beamformer can be tuned to yield different enhanced speech signals, which may be more suitable for improving speech intelligibility despite changes in the desired speech source position and imperfectly estimated spatial correlation matrices.


2021 ◽  
Vol 29 ◽  
Author(s):  
Débora do Canto ASSAF ◽  
Jessica Klöckner KNORST ◽  
Angela Ruviaro BUSANELLO-STELLA ◽  
Vilmar Antônio FERRAZZO ◽  
Luana Cristina BERWIG ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6493
Author(s):  
Song-Kyu Park ◽  
Joon-Hyuk Chang

In this paper, we propose a multi-channel cross-tower with attention mechanisms in latent domain network (Multi-TALK) that suppresses both the acoustic echo and background noise. The proposed approach consists of the cross-tower network, a parallel encoder with an auxiliary encoder, and a decoder. For the multi-channel processing, a parallel encoder is used to extract latent features of each microphone, and the latent features including the spatial information are compressed by a 1D convolution operation. In addition, the latent features of the far-end are extracted by the auxiliary encoder, and they are effectively provided to the cross-tower network by using the attention mechanism. The cross tower network iteratively estimates the latent features of acoustic echo and background noise in each tower. To improve the performance at each iteration, the outputs of each tower are transmitted as the input for the next iteration of the neighboring tower. Before passing through the decoder, to estimate the near-end speech, attention mechanisms are further applied to remove the estimated acoustic echo and background noise from the compressed mixture to prevent speech distortion by over-suppression. Compared to the conventional algorithms, the proposed algorithm effectively suppresses the acoustic echo and background noise and significantly lowers the speech distortion.


Sign in / Sign up

Export Citation Format

Share Document