scholarly journals Sentence Context Differentially Modulates Contributions of Fundamental Frequency Contours to Word Recognition in Chinese-Speaking Children With and Without Dyslexia

2020 ◽  
Vol 11 ◽  
Author(s):  
Linjun Zhang ◽  
Yu Li ◽  
Hong Zhou ◽  
Yang Zhang ◽  
Hua Shu

Previous work has shown that children with dyslexia are impaired in speech recognition in adverse listening conditions. Our study further examined how semantic context and fundamental frequency (F0) contours contribute to word recognition against interfering speech in dyslexic and non-dyslexic children. Thirty-two children with dyslexia and 35 chronological-age-matched control children were tested on the recognition of words in normal sentences versus wordlist sentences with natural versus flat F0 contours against single-talker interference. The dyslexic children had overall poorer recognition performance than non-dyslexic children. Furthermore, semantic context differentially modulated the effect of F0 contours on the recognition performances of the two groups. Specifically, compared with flat F0 contours, natural F0 contours increased the recognition accuracy of dyslexic children less than non-dyslexic children in the wordlist condition. By contrast, natural F0 contours increased the recognition accuracy of both groups to a similar extent in the sentence condition. These results indicate that access to semantic context improves the effect of natural F0 contours on word recognition in adverse listening conditions by dyslexic children who are more impaired in the use of natural F0 contours during isolated and unrelated word recognition. Our findings have practical implications for communication with dyslexic children when listening conditions are unfavorable.

2011 ◽  
Vol 268-270 ◽  
pp. 82-87
Author(s):  
Zhi Peng Zhao ◽  
Yi Gang Cen ◽  
Xiao Fang Chen

In this paper, we proposed a new noise speech recognition method based on the compressive sensing theory. Through compressive sensing, our method increases the anti-noise ability of speech recognition system greatly, which leads to the improvement of the recognition accuracy. According to the experiments, our proposed method achieved better recognition performance compared with the traditional isolated word recognition method based on DTW algorithm.


2020 ◽  
Vol 5 (2) ◽  
pp. 504
Author(s):  
Matthias Omotayo Oladele ◽  
Temilola Morufat Adepoju ◽  
Olaide ` Abiodun Olatoke ◽  
Oluwaseun Adewale Ojo

Yorùbá language is one of the three main languages that is been spoken in Nigeria. It is a tonal language that carries an accent on the vowel alphabets. There are twenty-five (25) alphabets in Yorùbá language with one of the alphabets a digraph (GB). Due to the difficulty in typing handwritten Yorùbá documents, there is a need to develop a handwritten recognition system that can convert the handwritten texts to digital format. This study discusses the offline Yorùbá handwritten word recognition system (OYHWR) that recognizes Yorùbá uppercase alphabets. Handwritten characters and words were obtained from different writers using the paint application and M708 graphics tablets. The characters were used for training and the words were used for testing. Pre-processing was done on the images and the geometric features of the images were extracted using zoning and gradient-based feature extraction. Geometric features are the different line types that form a particular character such as the vertical, horizontal, and diagonal lines. The geometric features used are the number of horizontal lines, number of vertical lines, number of right diagonal lines, number of left diagonal lines, total length of all horizontal lines, total length of all vertical lines, total length of all right slanting lines, total length of all left-slanting lines and the area of the skeleton. The characters are divided into 9 zones and gradient feature extraction was used to extract the horizontal and vertical components and geometric features in each zone. The words were fed into the support vector machine classifier and the performance was evaluated based on recognition accuracy. Support vector machine is a two-class classifier, hence a multiclass SVM classifier least square support vector machine (LSSVM) was used for word recognition. The one vs one strategy and RBF kernel were used and the recognition accuracy obtained from the tested words ranges between 66.7%, 83.3%, 85.7%, 87.5%, and 100%. The low recognition rate for some of the words could be as a result of the similarity in the extracted features.


2020 ◽  
Vol 31 (06) ◽  
pp. 412-441 ◽  
Author(s):  
Richard H. Wilson ◽  
Victoria A. Sanchez

Abstract Background In the 1950s, with monitored live voice testing, the vu meter time constant and the short durations and amplitude modulation characteristics of monosyllabic words necessitated the use of the carrier phrase amplitude to monitor (indirectly) the presentation level of the words. This practice continues with recorded materials. To relieve the carrier phrase of this function, first the influence that the carrier phrase has on word recognition performance needs clarification, which is the topic of this study. Purpose Recordings of Northwestern University Auditory Test No. 6 by two female speakers were used to compare word recognition performances with and without the carrier phrases when the carrier phrase and test word were (1) in the same utterance stream with the words excised digitally from the carrier (VA-1 speaker) and (2) independent of one another (VA-2 speaker). The 50-msec segment of the vowel in the target word with the largest root mean square amplitude was used to equate the target word amplitudes. Research Design A quasi-experimental, repeated measures design was used. Study Sample Twenty-four young normal-hearing adults (YNH; M = 23.5 years; pure-tone average [PTA] = 1.3-dB HL) and 48 older hearing loss listeners (OHL; M = 71.4 years; PTA = 21.8-dB HL) participated in two, one-hour sessions. Data Collection and Analyses Each listener had 16 listening conditions (2 speakers × 2 carrier phrase conditions × 4 presentation levels) with 100 randomized words, 50 different words by each speaker. Each word was presented 8 times (2 carrier phrase conditions × 4 presentation levels [YNH, 0- to 24-dB SL; OHL, 6- to 30-dB SL]). The 200 recorded words for each condition were randomized as 8, 25-word tracks. In both test sessions, one practice track was followed by 16 tracks alternated between speakers and randomized by blocks of the four conditions. Central tendency and repeated measures analyses of variance statistics were used. Results With the VA-1 speaker, the overall mean recognition performances were 6.0% (YNH) and 8.3% (OHL) significantly better with the carrier phrase than without the carrier phrase. These differences were in part attributed to the distortion of some words caused by the excision of the words from the carrier phrases. With the VA-2 speaker, recognition performances on the with and without carrier phrase conditions by both listener groups were not significantly different, except for one condition (YNH listeners at 8-dB SL). The slopes of the mean functions were steeper for the YNH listeners (3.9%/dB to 4.8%/dB) than for the OHL listeners (2.4%/dB to 3.4%/dB) and were <1%/dB steeper for the VA-1 speaker than for the VA-2 speaker. Although the mean results were clear, the variability in performance differences between the two carrier phrase conditions for the individual participants and for the individual words was striking and was considered in detail. Conclusion The current data indicate that word recognition performances with and without the carrier phrase (1) were different when the carrier phrase and target word were produced in the same utterance with poorer performances when the target words were excised from their respective carrier phrases (VA-1 speaker), and (2) were the same when the carrier phrase and target word were produced as independent utterances (VA-2 speaker).


2008 ◽  
Vol 19 (06) ◽  
pp. 496-506 ◽  
Author(s):  
Richard H. Wilson ◽  
Rachel McArdle ◽  
Heidi Roberts

Background: So that portions of the classic Miller, Heise, and Lichten (1951) study could be replicated, new recorded versions of the words and digits were made because none of the three common monosyllabic word lists (PAL PB-50, CID W-22, and NU–6) contained the 9 monosyllabic digits (1–10, excluding 7) that were used by Miller et al. It is well established that different psychometric characteristics have been observed for different lists and even for the same materials spoken by different speakers. The decision was made to record four lists of each of the three monosyllabic word sets, the monosyllabic digits not included in the three sets of word lists, and the CID W-1 spondaic words. A professional female speaker with a General American dialect recorded the materials during four recording sessions within a 2-week interval. The recording order of the 582 words was random. Purpose: To determine—on listeners with normal hearing—the psychometric properties of the five speech materials presented in speech-spectrum noise. Research Design: A quasi-experimental, repeated-measures design was used. Study Sample: Twenty-four young adult listeners (M = 23 years) with normal pure-tone thresholds (≤20-dB HL at 250 to 8000 Hz) participated. The participants were university students who were unfamiliar with the test materials. Data Collection and Analysis: The 582 words were presented at four signal-to-noise ratios (SNRs; −7-, −2-, 3-, and 8-dB) in speech-spectrum noise fixed at 72-dB SPL. Although the main metric of interest was the 50% point on the function for each word established with the Spearman-Kärber equation (Finney, 1952), the percentage correct on each word at each SNR was evaluated. The psychometric characteristics of the PB-50, CID W-22, and NU–6 monosyllabic word lists were compared with one another, with the CID W-1 spondaic words, and with the 9 monosyllabic digits. Results: Recognition performance on the four lists within each of the three monosyllabic word materials were equivalent, ±0.4 dB. Likewise, word-recognition performance on the PB-50, W-22, and NU–6 word lists were equivalent, ±0.2 dB. The mean recognition performance at the 50% point with the 36 W-1 spondaic words was ˜6.2 dB lower than the 50% point with the monosyllabic words. Recognition performance on the monosyllabic digits was 1–2 dB better than mean performance on the monosyllabic words. Conclusions: Word-recognition performances on the three sets of materials (PB-50, CID W-22, and NU–6) were equivalent, as were the performances on the four lists that make up each of the three materials. Phonetic/phonemic balance does not appear to be an important consideration in the compilation of word-recognition lists used to evaluate the ability of listeners to understand speech.A companion paper examines the acoustic, phonetic/phonological, and lexical variables that may predict the relative ease or difficulty for which these monosyllable words were recognized in noise (McArdle and Wilson, this issue).


2005 ◽  
Vol 16 (08) ◽  
pp. 622-630 ◽  
Author(s):  
Richard H. Wilson ◽  
Christopher A. Burks ◽  
Deborah G. Weakley

The purpose of this experiment was to determine the relationship between psychometric functions for words presented in multitalker babble using a descending presentation level protocol and a random presentation level protocol. Forty veterans (mean = 63.5 years) with mild-to-moderate sensorineural hearing losses were enrolled. Seventy of the Northwestern University Auditory Test No. 6 words spoken by the VA female speaker were presented at seven signal-to-babble ratios from 24 to 0 dB (10 words/step). Although the random procedure required 69 sec longer to administer than the descending protocol, there was no significant difference between the results obtained with the two psychophysical methods. There was almost no relation between the perceived ability of the listeners to understand speech in background noise and their measured ability to understand speech in multitalker babble. Likewise, there was a tenuous relation between pure-tone thresholds and performance on the words in babble and between recognition performance in quiet and performance on the words in babble.


2005 ◽  
Vol 36 (3) ◽  
pp. 219-229 ◽  
Author(s):  
Peggy Nelson ◽  
Kathryn Kohnert ◽  
Sabina Sabur ◽  
Daniel Shaw

Purpose: Two studies were conducted to investigate the effects of classroom noise on attention and speech perception in native Spanish-speaking second graders learning English as their second language (L2) as compared to English-only-speaking (EO) peers. Method: Study 1 measured children’s on-task behavior during instructional activities with and without soundfield amplification. Study 2 measured the effects of noise (+10 dB signal-to-noise ratio) using an experimental English word recognition task. Results: Findings from Study 1 revealed no significant condition (pre/postamplification) or group differences in observations in on-task performance. Main findings from Study 2 were that word recognition performance declined significantly for both L2 and EO groups in the noise condition; however, the impact was disproportionately greater for the L2 group. Clinical Implications: Children learning in their L2 appear to be at a distinct disadvantage when listening in rooms with typical noise and reverberation. Speech-language pathologists and audiologists should collaborate to inform teachers, help reduce classroom noise, increase signal levels, and improve access to spoken language for L2 learners.


2021 ◽  
Vol 13 (10) ◽  
pp. 265
Author(s):  
Jie Chen ◽  
Bing Han ◽  
Xufeng Ma ◽  
Jian Zhang

Underwater target recognition is an important supporting technology for the development of marine resources, which is mainly limited by the purity of feature extraction and the universality of recognition schemes. The low-frequency analysis and recording (LOFAR) spectrum is one of the key features of the underwater target, which can be used for feature extraction. However, the complex underwater environment noise and the extremely low signal-to-noise ratio of the target signal lead to breakpoints in the LOFAR spectrum, which seriously hinders the underwater target recognition. To overcome this issue and to further improve the recognition performance, we adopted a deep-learning approach for underwater target recognition, and a novel LOFAR spectrum enhancement (LSE)-based underwater target-recognition scheme was proposed, which consists of preprocessing, offline training, and online testing. In preprocessing, we specifically design a LOFAR spectrum enhancement based on multi-step decision algorithm to recover the breakpoints in LOFAR spectrum. In offline training, the enhanced LOFAR spectrum is adopted as the input of convolutional neural network (CNN) and a LOFAR-based CNN (LOFAR-CNN) for online recognition is developed. Taking advantage of the powerful capability of CNN in feature extraction, the recognition accuracy can be further improved by the proposed LOFAR-CNN. Finally, extensive simulation results demonstrate that the LOFAR-CNN network can achieve a recognition accuracy of 95.22%, which outperforms the state-of-the-art methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Li Liu ◽  
Yunfeng Ji ◽  
Yun Gao ◽  
Zhenyu Ping ◽  
Liang Kuang ◽  
...  

Traffic accidents are easily caused by tired driving. If the fatigue state of the driver can be identified in time and a corresponding early warning can be provided, then the occurrence of traffic accidents could be avoided to a large extent. At present, the recognition of fatigue driving states is mostly based on recognition accuracy. Fatigue state is currently recognized by combining different features, such as facial expressions, electroencephalogram (EEG) signals, yawning, and the percentage of eyelid closure over the pupil over time (PERCLoS). The combination of these features increases the recognition time and lacks real-time performance. In addition, some features will increase error in the recognition result, such as yawning frequently with the onset of a cold or frequent blinking with dry eyes. On the premise of ensuring the recognition accuracy and improving the realistic feasibility and real-time recognition performance of fatigue driving states, a fast support vector machine (FSVM) algorithm based on EEGs and electrooculograms (EOGs) is proposed to recognize fatigue driving states. First, the collected EEG and EOG modal data are preprocessed. Second, multiple features are extracted from the preprocessed EEGs and EOGs. Finally, FSVM is used to classify and recognize the data features to obtain the recognition result of the fatigue state. Based on the recognition results, this paper designs a fatigue driving early warning system based on Internet of Things (IoT) technology. When the driver shows symptoms of fatigue, the system not only sends a warning signal to the driver but also informs other nearby vehicles using this system through IoT technology and manages the operation background.


2014 ◽  
Vol 2 (2) ◽  
pp. 43-53 ◽  
Author(s):  
S. Rojathai ◽  
M. Venkatesulu

In speech word recognition systems, feature extraction and recognition plays a most significant role. More number of feature extraction and recognition methods are available in the existing speech word recognition systems. In most recent Tamil speech word recognition system has given high speech word recognition performance with PAC-ANFIS compared to the earlier Tamil speech word recognition systems. So the investigation of speech word recognition by various recognition methods is needed to prove their performance in the speech word recognition. This paper presents the investigation process with well known Artificial Intelligence method as Feed Forward Back Propagation Neural Network (FFBNN) and Adaptive Neuro Fuzzy Inference System (ANFIS). The Tamil speech word recognition system with PAC-FFBNN performance is analyzed in terms of statistical measures and Word Recognition Rate (WRR) and compared with PAC-ANFIS and other existing Tamil speech word recognition systems.


Sign in / Sign up

Export Citation Format

Share Document