phoneme detection
Recently Published Documents


TOTAL DOCUMENTS

16
(FIVE YEARS 4)

H-INDEX

5
(FIVE YEARS 0)



Author(s):  
Charles Clifton ◽  
Amanda Rysling ◽  
Jason Bishop


2021 ◽  
Author(s):  
Metehan Yurt ◽  
Pavan Kantharaju ◽  
Sascha Disch ◽  
Andreas Niedermeier ◽  
Alberto N. Escalante-B ◽  
...  


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1205
Author(s):  
Mohammed Algabri ◽  
Hassan Mathkour ◽  
Mansour M. Alsulaiman ◽  
Mohamed A. Bencherif

This study proposes using object detection techniques to recognize sequences of articulatory features (AFs) from speech utterances by treating AFs of phonemes as multi-label objects in speech spectrogram. The proposed system, called AFD-Obj, recognizes sequence of multi-label AFs in speech signal and localizes them. AFD-Obj consists of two main stages: firstly, we formulate the problem of AFs detection as an object detection problem and prepare the data to fulfill requirement of object detectors by generating a spectral three-channel image from the speech signal and creating the corresponding annotation for each utterance. Secondly, we use annotated images to train the proposed system to detect sequences of AFs and their boundaries. We test the system by feeding spectrogram images to the system, which will recognize and localize multi-label AFs. We investigated using these AFs to detect the utterance phonemes. YOLOv3-tiny detector is selected because of its real-time property and its support for multi-label detection. We test our AFD-Obj system on Arabic and English languages using KAPD and TIMIT corpora, respectively. Additionally, we propose using YOLOv3-tiny as an Arabic phoneme detection system (i.e., PD-Obj) to recognize and localize a sequence of Arabic phonemes from whole speech utterances. The proposed AFD-Obj and PD-Obj systems achieve excellent results for Arabic corpus and comparable to the state-of-the-art method for English corpus. Moreover, we showed that using only one-scale detection is suitable for AFs detection or phoneme recognition.



2020 ◽  
Vol 31 (08) ◽  
pp. 590-598
Author(s):  
Li Xu ◽  
Solveig C. Voss ◽  
Jing Yang ◽  
Xianhui Wang ◽  
Qian Lu ◽  
...  

Abstract Background Mandarin Chinese has a rich repertoire of high-frequency speech sounds. This may pose a remarkable challenge to hearing-impaired listeners who speak Mandarin Chinese because of their high-frequency sloping hearing loss. An adaptive nonlinear frequency compression (adaptive NLFC) algorithm has been implemented in contemporary hearing aids to alleviate the problem. Purpose The present study examined the performance of speech perception and sound-quality rating in Mandarin-speaking hearing-impaired listeners using hearing aids fitted with adaptive NLFC (i.e., SoundRecover2 or SR2) at different parameter settings. Research Design Hearing-impaired listeners' phoneme detection thresholds, speech reception thresholds, and sound-quality ratings were collected with various SR2 settings. Study Sample The participants included 15 Mandarin-speaking adults aged 32 to 84 years old who had symmetric sloping severe-to-profound sensorineural hearing loss. Intervention The participants were fitted bilaterally with Phonak Naida V90-SP hearing aids. Data Collection and Analysis The outcome measures included phoneme detection threshold using the Mandarin Phonak Phoneme Perception test, speech reception threshold using the Mandarin hearing in noise test (M-HINT), and sound-quality ratings on human speech in quiet and noise, bird chirps, and music in quiet. For each test, five experimental settings were applied and compared: SR2-off, SR2-weak, SR2-default, SR2-strong 1, and SR2-strong 2. Results The results showed that listeners performed significantly better with SR2-strong 1 and SR2-strong 2 settings than with SR2-off or SR2-weak settings for speech reception threshold and phoneme detection threshold. However, no significant improvement was observed in sound-quality ratings among different settings. Conclusions These preliminary findings suggested that the adaptive NLFC algorithm provides perceptual benefit to Mandarin-speaking people with severe-to-profound hearing loss.



2017 ◽  
Vol 122 (6) ◽  
pp. 476-491 ◽  
Author(s):  
Rachel Sermier Dessemontet ◽  
Anne-Françoise de Chambrier ◽  
Catherine Martinet ◽  
Urs Moser ◽  
Nicole Bayer

Abstract The phonological awareness skills of 7- to 8-year-old children with intellectual disability (ID) were compared to those of 4- to 5-year-old typically developing children who were matched for early reading skills, vocabulary, and gender. Globally, children with ID displayed a marked weakness in phonological awareness. Syllable blending, syllable segmentation, and first phoneme detection appeared to be preserved. In contrast, children with ID showed a marked weakness in rhyme detection and a slight weakness in phoneme blending. Two school years later, these deficits no longer remained. Marked weaknesses appeared in phoneme segmentation and first/last phoneme detection. The findings suggest that children with ID display an atypical pattern in phonological awareness that changes with age. The implications for practice and research are discussed.



2016 ◽  
Vol 27 (05) ◽  
pp. 367-379 ◽  
Author(s):  
Nicola Schmitt ◽  
Alexandra Winkler ◽  
Michael Boretzki ◽  
Inga Holube

Background: Outcomes with hearing aids (HAs) can be assessed using various speech tests, but many tests are not sensitive to changes in high-frequency audibility. Purpose: A Phoneme Perception Test (PPT), designed for the phonemes /s/ and //, has been developed to investigate whether detection and recognition tasks are able to measure individual differences in phoneme audibility and recognition for various hearing instrument settings. These capabilities were studied using two different fricative stimulus materials. The first set of materials preserves natural low-level sound components in the low- and mid-frequency ranges (LF set); the second set of materials attempts to limit the audibility to high-frequency fricative noise (nLF set). To study the effect on phoneme detection and recognition when auditory representations of /s/ and // are modified, a too strong nonlinear frequency compression (NLFC) setting was applied. Research Design: Repeated measure design was used under several different conditions. Study Sample: A total of 31 hearing-impaired individuals participated in this study. Of the 31 participants, 10 individuals did not own HAs but were provided with them during the study and 21 individuals owned HAs and were experienced users. All participants had a symmetrical sensorineural hearing loss. Data Collection and Analysis: The present study applied a phoneme detection test and a recognition test with two different stimulus sets under different amplification conditions. The statistical analysis focused on the capability of the PPT to measure the effect on audibility and perception of high-frequency information with and without HAs, and between HAs with two different NLFC settings (“default” and “too strong”). Results: Detection thresholds (DTs) and recognition thresholds (RTs) were compared with respective audiometric thresholds in the free field for all available conditions. Significant differences in thresholds between LF and nLF stimuli were observed. The thresholds for nLF stimuli showed higher correlation to the corresponding audiometric thresholds than the thresholds for LF stimuli. The difference in thresholds for unaided and aided conditions was larger for the stimulus set nLF than for the stimulus set LF. Also, thresholds were similar in both aided conditions for stimulus set LF, whereas a large difference between amplifications was observed for the stimulus set nLF. When NLFC was set “too strong,” DTs and RTs differed significantly for /s/. Conclusions: The findings from this study strongly suggest that measuring DTs and RTs with the stimulus set nLF is beneficial and useful to quantify the effects of HAs and NLFC on high-frequency speech cues for detection and recognition tasks. The findings also suggest that both tests are necessary because they assess audibility as well as recognition abilities, particularly as they relate to speech modification algorithms. The experiments conducted in this study did not allow for any acclimatization of the participants to increased high-frequency gain or NLFC. Further investigations should therefore examine the impact on DTs and RTs in the PPT as well as the contrasting effects of strong setting of NLFC to DTs and RTs because of (re)learning of modified auditory representations of /s/ and // as caused by NLFC.



2014 ◽  
Vol 3 (2) ◽  
pp. 199-212 ◽  
Author(s):  
Susan Rosink ◽  
Linda van Heeswijk ◽  
Martin Kroon ◽  
Anja Schüppert

The debate whether natural fast speech is more intelligible than artificially time-compressed speech has not clearly been answered yet. For Dutch, for instance, it has been shown in a phoneme detection task that time-compressed speech is more intelligible than natural fast speech, while for Danish listeners, no difference between the intelligibility of natural fast speech and time-compressed speech was reported from a dictation task. This article further investigates these conflicting results by reporting on a dictation task with Dutch listeners. The results suggest that the reported differences are more likely to be language-related than task-related.



2013 ◽  
Vol 303-306 ◽  
pp. 1030-1034
Author(s):  
Muhetaer Shasike ◽  
Buheliqiguli Wasili ◽  
Xiao Li

In this paper, we presented a semantic speech segmentation approach, in particular phoneme segmentation. In order to get phoneme level information, a novel voiced speech, unvoiced speech and silence (VUS) classification is proposed. Five parameters that can be extracted by short-time analysis methods are used to discriminate the phoneme boundary. Experiments on Uyghur broadcasting news indicate that the performance of proposed algorithm is satisfying.



2010 ◽  
Vol 53 (3) ◽  
pp. 307-320 ◽  
Author(s):  
Anne Cutler ◽  
Rebecca Treiman ◽  
Brit van Ooijen


Sign in / Sign up

Export Citation Format

Share Document