speech segments Latest Research Papers

Microphone array speech enhancement algorithm uses temporal and spatial informa- tion to improve the performance of speech noise reduction significantly. By combining noise estimation algorithm with microphone array speech enhancement, the accuracy of noise estimation is improved, and the computation is reduced. In traditional noise es- timation algorithms, the noise power spectrum is not updated in the presence of speech, which leads to the delay and deviation of noise spectrum estimation. An optimized im- proved minimum controlled recursion average speech enhancement algorithm, based on a microphone matrix is proposed in this paper. It consists of three parts. The first part is the preprocessing, divided into two branches: the upper branch enhances the speech signal, and the lower branch gets the noise. The second part is the optimized improved minimum controlled recursive averaging. The noise power spectrum is updated not only in the non-speech segments but also in the speech segments. Fi- nally, according to the estimated noise power spectrum, the minimum mean-square error log-spectral amplitude algorithm is used to enhance speech. Testing data are from TIMIT and Noisex-92 databases. Short-time objective intelligibility and seg- mental signal-to-noise ratio are chosen as evaluation metrics. Experimental results show that the proposed speech enhancement algorithm can improve the segmental signal-to-noise ratio and short-time objective intelligibility for various noise types at different signal-to-noise ratio levels.

Download Full-text

Construction of auditory bombardment therapy program: a pilot study

QJM ◽

10.1093/qjmed/hcab094.019 ◽

2021 ◽

Vol 114 (Supplement_1) ◽

Author(s):

Nahla Abd-ElAziz Rifaie ◽

Dina Ahmed Elrefaie ◽

Mona Mosaad Mahmoud

Keyword(s):

Conventional Therapy ◽

Speech Sound ◽

University Hospitals ◽

Speech Sound Disorder ◽

Phonological Processes ◽

Therapy Program ◽

Significant Difference ◽

Group 2 ◽

Speech Segments ◽

Group 1

Abstract Background Speech sound disorder is a communication disorder in which children have persistent difficulty saying words or sounds correctly. It refers to any difficulty or combination of difficulties with perception, motor production, or phonological representation of speech sounds and speech segments. Aim of the Work to construct an Arabic auditory bombardment therapy program and measure its effectiveness in treatment of functional speech sound disorder. Subjects and Methods This study was applied on 60 participants divided into 2 groups (30 for each group) with age ranging from 3-5 years diagnosed with functional speech sound disorder with or without language disorders, attending at the Phoniatrics outpatient clinic in Ain Shams University Hospitals. The test for identification of phonological processes was applied on 60 patients with speech sound disorder selected to participate in this study. These were divided in to 2 groups (Group (1) received only the conventional therapy while group (2) received auditory bombardment in addition to the conventional therapy for 3 months) and the test was repeated again after therapy. Results Group (2) showed high significant difference (improvement) in consonant assimilation, voicing change, final consonant deletion, palatal fronting, gliding, lateralization and glottal replacement while group (1) showed high significant difference (improvement) in syllable deletion and partial cluster reduction. Conclusion The present study showed that application of auditory bombardment therapy program in addition to conventional therapy has high significant improvement than conventional therapy alone.

Download Full-text

Identifying Indicators of Vulnerability from Short Speech Segments Using Acoustic and Textual Features

10.21437/interspeech.2021-1525 ◽

2021 ◽

Author(s):

Xia Cui ◽

Amila Gamage ◽

Terry Hanley ◽

Tingting Mu

Keyword(s):

Textual Features ◽

Speech Segments

Download Full-text

Synchronising Speech Segments with Musical Beats in Mandarin and English Singing

10.21437/interspeech.2021-1841 ◽

2021 ◽

Author(s):

Cong Zhang ◽

Jian Zhu

Keyword(s):

Speech Segments

Download Full-text

Automatic Detection of Shouted Speech Segments in Indian News Debates

10.21437/interspeech.2021-1592 ◽

2021 ◽

Author(s):

Shikha Baghel ◽

Mrinmoy Bhattacharjee ◽

S.R. Mahadeva Prasanna ◽

Prithwijit Guha

Keyword(s):

Automatic Detection ◽

Speech Segments

Download Full-text

A Literature Review of Neural Tracking to Speech Envelope from the View of Audiology

Audiology and Speech Research ◽

10.21848/asr.210004 ◽

2021 ◽

Vol 17 (3) ◽

pp. 269-277

Author(s):

Sungmin Lee

Keyword(s):

Hearing Loss ◽

Speech Perception ◽

Literature Review ◽

Neural Mechanism ◽

Auditory Evoked Potential ◽

Speech Comprehension ◽

Auditory Periphery ◽

Speech Envelope ◽

Auditory Functions ◽

Speech Segments

Despite the significant contribution of hearing assistive devices, medications, and surgery to restoring auditory periphery, a large number of people with hearing loss still struggle with understanding speech. This leads many studies on speech perception to move towards the central auditory functions by looking at associated brain activities using macroscopic recording tools such as electroencephalography (EEG). Up until a few years ago, however, limitation has been given to the brain scientists who attempted to investigate speech perception mechanisms using the EEG. In particular, short duration of speech segments has inevitably been used to elicit auditory evoked potential, even though they were too brief to be considered as speech. Today, however, advance in neural engineering and better understanding of neural mechanism have better facilitated brain scientists to perform studies with running stream of continuous speech and expand the scope of EEG studies to include comprehension of more realistic speech envelope. The purpose of this study is to review literatures on neural tracking to speech envelope to discuss it in Audiology perspective. This review article consists of seven subjects including introduction, neural tracking theories, neural tracking measure, signal processing & analysis, literature review on neural tracking associated with hearing loss, application of neural tracking to audiology, and conclusion. We noted that neural tracking has potential to be used in clinical sets to objectively evaluate speech comprehension for people with hearing loss in the future.

Download Full-text

Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection

10.1109/ncc52529.2021.9530078 ◽

2021 ◽

Author(s):

Shikha Baghel ◽

S. R. M. Prasanna ◽

Prithwijit Guha

Keyword(s):

High Energy ◽

Speech Detection ◽

Voiced Speech ◽

Speech Segments

Download Full-text

Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00212-9 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Gui-Xin Shi ◽

Wei-Qiang Zhang ◽

Guan-Bo Wang ◽

Jing Zhao ◽

Shu-Zhou Chai ◽

...

Keyword(s):

Hidden Markov ◽

Gaussian Mixture ◽

Error Rates ◽

Important Data ◽

Large Margin ◽

Front End ◽

End To End ◽

The Cost ◽

Speech Segments ◽

Weighted Values

AbstractMany end-to-end approaches have been proposed to detect predefined keywords. For scenarios of multi-keywords, there are still two bottlenecks that need to be resolved: (1) the distribution of important data that contains keyword(s) is sparse, and (2) the timestamps of the detected keywords are inaccurate. In this paper, to alleviate the first issue and further improve the performance of the end-to-end ASR front-end, we propose the biased loss function for guiding the recognizer to pay more attention to the speech segments containing the predefined keywords. As for the second issue, we solve this problem by modifying the force alignment applied to the end-to-end ASR front-end. To get the frame-level alignment, we utilize a Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) based acoustic model (AM) for auxiliary. The proposed system is evaluated in the OpenSAT20 held by the National Institute of Standards and Technology (NIST). The performance of our end-to-end KWS system is comparable to the conventional hybrid KWS system, sometimes even slightly better. With fusion results of the end-to-end and conventional KWS systems, we won the first prize in the KWS track. On the dev dataset (a part of SAFE-T corpus), the system outperforms the baseline by a large margin, i.e., our system with GMM-HMM aligner has a lower segmentation-aware word error rates (relatively 7.9–19.2% decrease) and higher overall Actual term-weighted values (relatively 3.6–11.0% increase), which demonstrates the effectiveness of the proposed method. For more precise alignments, we can use DNN-based AM as alignmentor at the cost of more computation.

Download Full-text

Digital Speech Analysis in Progressive Supranuclear Palsy and Corticobasal Syndromes

Journal of Alzheimer s Disease ◽

10.3233/jad-201132 ◽

2021 ◽

Vol 82 (1) ◽

pp. 33-45 ◽

Cited By ~ 1

Author(s):

Natalia Parjane ◽

Sunghye Cho ◽

Sharon Ash ◽

Katheryn A.Q. Cousins ◽

Sanjana Shellikeri ◽

...

Keyword(s):

Progressive Supranuclear Palsy ◽

Speech Rate ◽

Full Range ◽

Primary Progressive Aphasia ◽

Speech Impairments ◽

Neuropsychological Measures ◽

Speech Fluency ◽

Partial Word ◽

Speech Features ◽

Speech Segments

Background: Progressive supranuclear palsy syndrome (PSPS) and corticobasal syndrome (CBS) as well as non-fluent/agrammatic primary progressive aphasia (naPPA) are often associated with misfolded 4-repeat tau pathology, but the diversity of the associated speech features is poorly understood. Objective: Investigate the full range of acoustic and lexical properties of speech to test the hypothesis that PSPS-CBS show a subset of speech impairments found in naPPA. Methods: Acoustic and lexical measures, extracted from natural, digitized semi-structured speech samples using novel, automated methods, were compared in PSPS-CBS (n = 87), naPPA (n = 25), and healthy controls (HC, n = 41). We related these measures to grammatical performance and speech fluency, core features of naPPA, to neuropsychological measures of naming, executive, memory and visuoconstructional functioning, and to cerebrospinal fluid (CSF) phosphorylated tau (pTau) levels in patients with available biofluid analytes. Results: Both naPPA and PSPS-CBS speech produced shorter speech segments, longer pauses, higher pause rates, reduced fundamental frequency (f0) pitch ranges, and slower speech rate compared to HC. naPPA speech was distinct from PSPS-CBS with shorter speech segments, more frequent pauses, slower speech rate, reduced verb production, and higher partial word production. In both groups, acoustic duration measures generally correlated with speech fluency, measured as words per minute, and grammatical performance. Speech measures did not correlate with standard neuropsychological measures. CSF pTau levels correlated with f0 range in PSPS-CBS and naPPA. Conclusion: Lexical and acoustic speech features of PSPS-CBS overlaps those of naPPA and are related to CSF pTau levels.

Download Full-text

Synchronising speech segments with musical beats in Mandarin and English singing

10.31219/osf.io/3xdjw ◽

2021 ◽

Author(s):

Cong Zhang ◽

Jian Zhu

Keyword(s):

Temporal Relationship ◽

Training Data ◽

Singing Voice ◽

Speech Data ◽

Speech Training ◽

Speech Segments ◽

Segment Duration

Generating synthesised singing voice with models trained on speech data has many advantages due to the models' flexibility and controllability. However, since the information about the temporal relationship between segments and beats are lacking in speech training data, the synthesised singing may sound off-beat at times. Therefore, the availability of the information on the temporal relationship between speech segments and music beats is crucial. The current study investigated the segment-beat synchronisation in singing data, with hypotheses formed based on the linguistics theories of P-centre and sonority hierarchy. A Mandarin corpus and an English corpus of professional singing data were manually annotated and analysed. The results showed that the presence of musical beats was more dependent on segment duration than sonority. However, the sonority hierarchy and the P-centre theory were highly related to the location of beats. Mandarin and English demonstrated cross-linguistic variations despite exhibiting common patterns.

Download Full-text

speech segments
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Microphone array speech enhancement based on optimized IMCRA

Construction of auditory bombardment therapy program: a pilot study

Identifying Indicators of Vulnerability from Short Speech Segments Using Acoustic and Textual Features

Synchronising Speech Segments with Musical Beats in Mandarin and English Singing

Automatic Detection of Shouted Speech Segments in Indian News Debates

A Literature Review of Neural Tracking to Speech Envelope from the View of Audiology

Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection

Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system

Digital Speech Analysis in Progressive Supranuclear Palsy and Corticobasal Syndromes

Synchronising speech segments with musical beats in Mandarin and English singing

Export Citation Format

speech segmentsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Microphone array speech enhancement based on optimized IMCRA

Construction of auditory bombardment therapy program: a pilot study

Identifying Indicators of Vulnerability from Short Speech Segments Using Acoustic and Textual Features

Synchronising Speech Segments with Musical Beats in Mandarin and English Singing

Automatic Detection of Shouted Speech Segments in Indian News Debates

A Literature Review of Neural Tracking to Speech Envelope from the View of Audiology

Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection

Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system

Digital Speech Analysis in Progressive Supranuclear Palsy and Corticobasal Syndromes

Synchronising speech segments with musical beats in Mandarin and English singing

speech segments
Recently Published Documents