speech segments
Recently Published Documents


TOTAL DOCUMENTS

161
(FIVE YEARS 40)

H-INDEX

15
(FIVE YEARS 1)

2021 ◽  
Vol 69 (6) ◽  
pp. 468-476
Author(s):  
Qiuying Li ◽  
Tao Zhang ◽  
Yanzhang Geng ◽  
Zhen Gao

Microphone array speech enhancement algorithm uses temporal and spatial informa- tion to improve the performance of speech noise reduction significantly. By combining noise estimation algorithm with microphone array speech enhancement, the accuracy of noise estimation is improved, and the computation is reduced. In traditional noise es- timation algorithms, the noise power spectrum is not updated in the presence of speech, which leads to the delay and deviation of noise spectrum estimation. An optimized im- proved minimum controlled recursion average speech enhancement algorithm, based on a microphone matrix is proposed in this paper. It consists of three parts. The first part is the preprocessing, divided into two branches: the upper branch enhances the speech signal, and the lower branch gets the noise. The second part is the optimized improved minimum controlled recursive averaging. The noise power spectrum is updated not only in the non-speech segments but also in the speech segments. Fi- nally, according to the estimated noise power spectrum, the minimum mean-square error log-spectral amplitude algorithm is used to enhance speech. Testing data are from TIMIT and Noisex-92 databases. Short-time objective intelligibility and seg- mental signal-to-noise ratio are chosen as evaluation metrics. Experimental results show that the proposed speech enhancement algorithm can improve the segmental signal-to-noise ratio and short-time objective intelligibility for various noise types at different signal-to-noise ratio levels.


QJM ◽  
2021 ◽  
Vol 114 (Supplement_1) ◽  
Author(s):  
Nahla Abd-ElAziz Rifaie ◽  
Dina Ahmed Elrefaie ◽  
Mona Mosaad Mahmoud

Abstract Background Speech sound disorder is a communication disorder in which children have persistent difficulty saying words or sounds correctly. It refers to any difficulty or combination of difficulties with perception, motor production, or phonological representation of speech sounds and speech segments. Aim of the Work to construct an Arabic auditory bombardment therapy program and measure its effectiveness in treatment of functional speech sound disorder. Subjects and Methods This study was applied on 60 participants divided into 2 groups (30 for each group) with age ranging from 3-5 years diagnosed with functional speech sound disorder with or without language disorders, attending at the Phoniatrics outpatient clinic in Ain Shams University Hospitals. The test for identification of phonological processes was applied on 60 patients with speech sound disorder selected to participate in this study. These were divided in to 2 groups (Group (1) received only the conventional therapy while group (2) received auditory bombardment in addition to the conventional therapy for 3 months) and the test was repeated again after therapy. Results Group (2) showed high significant difference (improvement) in consonant assimilation, voicing change, final consonant deletion, palatal fronting, gliding, lateralization and glottal replacement while group (1) showed high significant difference (improvement) in syllable deletion and partial cluster reduction. Conclusion The present study showed that application of auditory bombardment therapy program in addition to conventional therapy has high significant improvement than conventional therapy alone.


2021 ◽  
Author(s):  
Shikha Baghel ◽  
Mrinmoy Bhattacharjee ◽  
S.R. Mahadeva Prasanna ◽  
Prithwijit Guha

2021 ◽  
Vol 17 (3) ◽  
pp. 269-277
Author(s):  
Sungmin Lee

Despite the significant contribution of hearing assistive devices, medications, and surgery to restoring auditory periphery, a large number of people with hearing loss still struggle with understanding speech. This leads many studies on speech perception to move towards the central auditory functions by looking at associated brain activities using macroscopic recording tools such as electroencephalography (EEG). Up until a few years ago, however, limitation has been given to the brain scientists who attempted to investigate speech perception mechanisms using the EEG. In particular, short duration of speech segments has inevitably been used to elicit auditory evoked potential, even though they were too brief to be considered as speech. Today, however, advance in neural engineering and better understanding of neural mechanism have better facilitated brain scientists to perform studies with running stream of continuous speech and expand the scope of EEG studies to include comprehension of more realistic speech envelope. The purpose of this study is to review literatures on neural tracking to speech envelope to discuss it in Audiology perspective. This review article consists of seven subjects including introduction, neural tracking theories, neural tracking measure, signal processing & analysis, literature review on neural tracking associated with hearing loss, application of neural tracking to audiology, and conclusion. We noted that neural tracking has potential to be used in clinical sets to objectively evaluate speech comprehension for people with hearing loss in the future.


Author(s):  
Gui-Xin Shi ◽  
Wei-Qiang Zhang ◽  
Guan-Bo Wang ◽  
Jing Zhao ◽  
Shu-Zhou Chai ◽  
...  

AbstractMany end-to-end approaches have been proposed to detect predefined keywords. For scenarios of multi-keywords, there are still two bottlenecks that need to be resolved: (1) the distribution of important data that contains keyword(s) is sparse, and (2) the timestamps of the detected keywords are inaccurate. In this paper, to alleviate the first issue and further improve the performance of the end-to-end ASR front-end, we propose the biased loss function for guiding the recognizer to pay more attention to the speech segments containing the predefined keywords. As for the second issue, we solve this problem by modifying the force alignment applied to the end-to-end ASR front-end. To get the frame-level alignment, we utilize a Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) based acoustic model (AM) for auxiliary. The proposed system is evaluated in the OpenSAT20 held by the National Institute of Standards and Technology (NIST). The performance of our end-to-end KWS system is comparable to the conventional hybrid KWS system, sometimes even slightly better. With fusion results of the end-to-end and conventional KWS systems, we won the first prize in the KWS track. On the dev dataset (a part of SAFE-T corpus), the system outperforms the baseline by a large margin, i.e., our system with GMM-HMM aligner has a lower segmentation-aware word error rates (relatively 7.9–19.2% decrease) and higher overall Actual term-weighted values (relatively 3.6–11.0% increase), which demonstrates the effectiveness of the proposed method. For more precise alignments, we can use DNN-based AM as alignmentor at the cost of more computation.


2021 ◽  
Vol 82 (1) ◽  
pp. 33-45 ◽  
Author(s):  
Natalia Parjane ◽  
Sunghye Cho ◽  
Sharon Ash ◽  
Katheryn A.Q. Cousins ◽  
Sanjana Shellikeri ◽  
...  

Background: Progressive supranuclear palsy syndrome (PSPS) and corticobasal syndrome (CBS) as well as non-fluent/agrammatic primary progressive aphasia (naPPA) are often associated with misfolded 4-repeat tau pathology, but the diversity of the associated speech features is poorly understood. Objective: Investigate the full range of acoustic and lexical properties of speech to test the hypothesis that PSPS-CBS show a subset of speech impairments found in naPPA. Methods: Acoustic and lexical measures, extracted from natural, digitized semi-structured speech samples using novel, automated methods, were compared in PSPS-CBS (n = 87), naPPA (n = 25), and healthy controls (HC, n = 41). We related these measures to grammatical performance and speech fluency, core features of naPPA, to neuropsychological measures of naming, executive, memory and visuoconstructional functioning, and to cerebrospinal fluid (CSF) phosphorylated tau (pTau) levels in patients with available biofluid analytes. Results: Both naPPA and PSPS-CBS speech produced shorter speech segments, longer pauses, higher pause rates, reduced fundamental frequency (f0) pitch ranges, and slower speech rate compared to HC. naPPA speech was distinct from PSPS-CBS with shorter speech segments, more frequent pauses, slower speech rate, reduced verb production, and higher partial word production. In both groups, acoustic duration measures generally correlated with speech fluency, measured as words per minute, and grammatical performance. Speech measures did not correlate with standard neuropsychological measures. CSF pTau levels correlated with f0 range in PSPS-CBS and naPPA. Conclusion: Lexical and acoustic speech features of PSPS-CBS overlaps those of naPPA and are related to CSF pTau levels.


2021 ◽  
Author(s):  
Cong Zhang ◽  
Jian Zhu

Generating synthesised singing voice with models trained on speech data has many advantages due to the models' flexibility and controllability. However, since the information about the temporal relationship between segments and beats are lacking in speech training data, the synthesised singing may sound off-beat at times. Therefore, the availability of the information on the temporal relationship between speech segments and music beats is crucial. The current study investigated the segment-beat synchronisation in singing data, with hypotheses formed based on the linguistics theories of P-centre and sonority hierarchy. A Mandarin corpus and an English corpus of professional singing data were manually annotated and analysed. The results showed that the presence of musical beats was more dependent on segment duration than sonority. However, the sonority hierarchy and the P-centre theory were highly related to the location of beats. Mandarin and English demonstrated cross-linguistic variations despite exhibiting common patterns.


Sign in / Sign up

Export Citation Format

Share Document