speech transcription Latest Research Papers

The Intelligibility in Context Scale (ICS) is a widely used, efficient tool for describing a child’s speech intelligibility. Few studies have explored the relationship between ICS scores and transcription intelligibility scores, which are the gold standard for clinical measurement. This study examined how well ICS composite scores predicted transcription intelligibility scores among children with cerebral palsy (CP), how well individual questions from the ICS differentially predicted transcription intelligibility scores, and how well the ICS composite scores differentiated between children with and without speech motor impairment. Parents of 48 children with CP, who were approximately 13 years of age, completed the ICS. Ninety-six adult naïve listeners provided orthographic transcriptions of children’s speech. Transcription intelligibility scores were regressed on ICS composite scores and individual item scores. Dysarthria status was regressed on ICS composite scores. Results indicated that ICS composite scores were moderately strong predictors of transcription intelligibility scores. One individual ICS item differentially predicted transcription intelligibility scores, and dysarthria severity influenced how well ICS composite scores differentiated between children with and without speech motor impairment. Findings suggest that the ICS has potential clinical utility for children with CP, especially when used with other objective measures of speech intelligibility.

Download Full-text

Revisiting Parity of Human vs. Machine Conversational Speech Transcription

10.21437/interspeech.2021-1908 ◽

2021 ◽

Author(s):

Courtney Mansfield ◽

Sara Ng ◽

Gina-Anne Levow ◽

Richard A. Wright ◽

Mari Ostendorf

Keyword(s):

Conversational Speech ◽

Speech Transcription

Download Full-text

Correcting Automated and Manual Speech Transcription Errors Using Warped Language Models

10.21437/interspeech.2021-591 ◽

2021 ◽

Author(s):

Mahdi Namazifar ◽

John Malik ◽

Li Erran Li ◽

Gokhan Tur ◽

Dilek Hakkani Tür

Keyword(s):

Language Models ◽

Speech Transcription

Download Full-text

An Evaluation of Expedited Transcription Methods for School-Age Children's Narrative Language: Automatic Speech Recognition and Real-Time Transcription

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-21-00096 ◽

2021 ◽

pp. 1-16

Author(s):

Carly B. Fox ◽

Megan Israelsen-Augenstein ◽

Sharad Jones ◽

Sandra Laing Gillam

Keyword(s):

Speech Recognition ◽

Real Time ◽

Automatic Speech Recognition ◽

Clinical Utility ◽

Language Disorder ◽

Speech Rate ◽

School Age Children ◽

School Age ◽

Speech Transcription ◽

Narrative Language

Purpose This study examined the accuracy and potential clinical utility of two expedited transcription methods for narrative language samples elicited from school-age children (7;5–11;10 [years;months]) with developmental language disorder. Transcription methods included real-time transcription produced by speech-language pathologists (SLPs) and trained transcribers (TTs) as well as Google Cloud Speech automatic speech recognition. Method The accuracy of each transcription method was evaluated against a gold-standard reference corpus. Clinical utility was examined by determining the reliability of scores calculated from the transcripts produced by each method on several language sample analysis (LSA) measures. Participants included seven certified SLPs and seven TTs. Each participant was asked to produce a set of six transcripts in real time, out of a total 42 language samples. The same 42 samples were transcribed using Google Cloud Speech. Transcription accuracy was evaluated through word error rate. Reliability of LSA scores was determined using correlation analysis. Results Results indicated that Google Cloud Speech was significantly more accurate than real-time transcription in transcribing narrative samples and was not impacted by speech rate of the narrator. In contrast, SLP and TT transcription accuracy decreased as a function of increasing speech rate. LSA metrics generated from Google Cloud Speech transcripts were also more reliably calculated. Conclusions Automatic speech recognition showed greater accuracy and clinical utility as an expedited transcription method than real-time transcription. Though there is room for improvement in the accuracy of speech recognition for the purpose of clinical transcription, it produced highly reliable scores on several commonly used LSA metrics. Supplemental Material https://doi.org/10.23641/asha.15167355

Download Full-text

Boys-Specific Text-Comprehension Enhancement With Dual Visual-Auditory Text Presentation Among 12–14 Years-Old Students

Frontiers in Psychology ◽

10.3389/fpsyg.2021.574685 ◽

2021 ◽

Vol 12 ◽

Author(s):

Maria Jose Alvarez-Alonso ◽

Cristina de-la-Peña ◽

Zaira Ortega ◽

Ricardo Scott

Keyword(s):

School Performance ◽

Text Comprehension ◽

Future Research ◽

Auditory Comprehension ◽

Semantic Fluency ◽

Dual Modality ◽

Text Presentation ◽

Visual Text ◽

Standardized Reading Test ◽

Speech Transcription

Quality of language comprehension determines performance in all kinds of activities including academics. Processing of words initially develops as auditory, and gradually extends to visual as children learn to read. School failure is highly related to listening and reading comprehension problems. In this study we analyzed sex-differences in comprehension of texts in Spanish (standardized reading test PROLEC-R) in three modalities (visual, auditory, and both simultaneously: dual-modality) presented to 12–14-years old students, native in Spanish. We controlled relevant cognitive variables such as attention (d2), phonological and semantic fluency (FAS) and speed of processing (WISC subtest Coding). Girls’ comprehension was similar in the three modalities of presentation, however boys were importantly benefited by dual-modality as compared to boys exposed only to visual or auditory text presentation. With respect to the relation of text comprehension and school performance, students with low grades in Spanish showed low auditory comprehension. Interestingly, visual and dual modalities preserved comprehension levels in these low skilled students. Our results suggest that the use of visual-text support during auditory language presentation could be beneficial for low school performance students, especially boys, and encourage future research to evaluate the implementation in classes of the rapidly developing technology of simultaneous speech transcription, that could be, in addition, beneficial to non-native students, especially those recently incorporated into school or newly arrived in a country from abroad.

Download Full-text

The Vicomtech Speech Transcription Systems for the Albayzín-RTVE 2020 Speech to Text Transcription Challenge

10.21437/iberspeech.2021-22 ◽

2021 ◽

Author(s):

Aitor Álvarez ◽

Haritz Arzelus ◽

Iván G. Torre ◽

Ander González-Docasal

Keyword(s):

Speech Transcription

Download Full-text

Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching

Electronics ◽

10.3390/electronics10030235 ◽

2021 ◽

Vol 10 (3) ◽

pp. 235

Author(s):

Natalia Bogach ◽

Elena Boitsova ◽

Sergey Chernonog ◽

Anton Lamtev ◽

Maria Lesnichaya ◽

...

Keyword(s):

Signal Processing ◽

Speech Recognition ◽

Language Learning ◽

Automatic Speech Recognition ◽

Speech Processing ◽

Foreign Language Learning ◽

Performance Estimation ◽

Third Party ◽

Computer Assisted ◽

Speech Transcription

This article contributes to the discourse on how contemporary computer and information technology may help in improving foreign language learning not only by supporting better and more flexible workflow and digitizing study materials but also through creating completely new use cases made possible by technological improvements in signal processing algorithms. We discuss an approach and propose a holistic solution to teaching the phonological phenomena which are crucial for correct pronunciation, such as the phonemes; the energy and duration of syllables and pauses, which construct the phrasal rhythm; and the tone movement within an utterance, i.e., the phrasal intonation. The working prototype of StudyIntonation Computer-Assisted Pronunciation Training (CAPT) system is a tool for mobile devices, which offers a set of tasks based on a “listen and repeat” approach and gives the audio-visual feedback in real time. The present work summarizes the efforts taken to enrich the current version of this CAPT tool with two new functions: the phonetic transcription and rhythmic patterns of model and learner speech. Both are designed on a base of a third-party automatic speech recognition (ASR) library Kaldi, which was incorporated inside StudyIntonation signal processing software core. We also examine the scope of automatic speech recognition applicability within the CAPT system workflow and evaluate the Levenstein distance between the transcription made by human experts and that obtained automatically in our code. We developed an algorithm of rhythm reconstruction using acoustic and language ASR models. It is also shown that even having sufficiently correct production of phonemes, the learners do not produce a correct phrasal rhythm and intonation, and therefore, the joint training of sounds, rhythm and intonation within a single learning environment is beneficial. To mitigate the recording imperfections voice activity detection (VAD) is applied to all the speech records processed. The try-outs showed that StudyIntonation can create transcriptions and process rhythmic patterns, but some specific problems with connected speech transcription were detected. The learners feedback in the sense of pronunciation assessment was also updated and a conventional mechanism based on dynamic time warping (DTW) was combined with cross-recurrence quantification analysis (CRQA) approach, which resulted in a better discriminating ability. The CRQA metrics combined with those of DTW were shown to add to the accuracy of learner performance estimation. The major implications for computer-assisted English pronunciation teaching are discussed.

Download Full-text

Interpersonal functions in Greta Thunberg’s “civil society for rEUnaissance” speech

Journal of Applied Studies in Language ◽

10.31940/jasl.v4i2.2084 ◽

2020 ◽

Vol 4 (2) ◽

pp. 294-305

Author(s):

Stefanny Lauwren ◽

Keyword(s):

European Union ◽

Civil Society ◽

Critical Discourse Analysis ◽

Critical Discourse ◽

The European Union ◽

Functional Grammar ◽

Systemic Functional Grammar ◽

The One ◽

Speech Transcription ◽

Climate Crisis

Greta Thunberg delivered a speech entitled “You’re Acting Like Spoiled, Irresponsible Children” to influential figures in Europe through the “Civil Society for renaissance” event, in which she was personally invited by the president of the organizer, Luca Jahier. Through her speech, she managed to convince the European Union to pledge to spend billions of euros to combat the climate crisis. This study aims to discover how interpersonal metafunction is used in the speech and what functions are revealed through Fairclough’s Critical Discourse Analysis and Hallidayan Systemic Functional Grammar. The data, consisting of seventy-one independent clauses, were taken from Thunberg’s book which consists of her speech transcription, titled “No One Is Too Small to Make a Difference”. The research discovers that through the use of mood, modality, and pronouns, Thunberg conveys her view on her relationship with the audience as victim and perpetrator, and the one who holds responsibility and takes the blame.

Download Full-text

M111. QUANTITATIVE ASSESSMENT OF MANIA AND PSYCHOSIS DURING HOSPITALIZATION USING AUTOMATED ANALYSIS OF FACE, VOICE, AND LANGUAGE

Schizophrenia Bulletin ◽

10.1093/schbul/sbaa030.423 ◽

2020 ◽

Vol 46 (Supplement_1) ◽

pp. S177-S177

Author(s):

Can Kilciksiz ◽

Katrina Brown ◽

Alexandria Vail ◽

Tadas Baltrusaitis ◽

Luciana Pennant ◽

...

Keyword(s):

Mental Health ◽

Health Assessment ◽

Automated Analysis ◽

Psychotic Symptoms ◽

Quantitative Metrics ◽

Wide Range ◽

Access And Quality ◽

Speech Transcription ◽

Mood And Cognition ◽

Clinical Measures

Abstract Background A major challenge for reliable and effective mental health care is the lack of objective markers of illness. Computational approaches to measuring naturalistic behavior in clinical settings could therefore provide an objective backstop for mental health assessment and disease monitoring. This study aimed to train machine-learning (ML) classifiers to estimate conventional clinical measures of severe mental illness using quantitative metrics derived from computational analysis of facial and vocal behaviors. Methods Individuals hospitalized for any active psychotic condition were recruited to participate in up to ten recorded study visits, comprised of three segments. Each visit was captured using two synchronized HD webcams and cardioid microphones, to obtain high quality audiovisual (AV) data from both patient and interviewer. We performed automated facial action coding, vocal analysis, and speech transcription using publicly available software (e.g., openFace, openSmile, TranscribeMe). Results A total of 34 participants, participated in 66 sessions between 2015 and 2018, resulting in over 40 hours of AV recordings. In our visual and vocal analysis, we found that several features derived from face, voice, and use of language (i.e. eyebrow furrowing, eye widening, smile variability, characteristics of vowels) were both robustly measured using our approach, and allowed us to accurately estimate multiple symptom domains (i.e. mania, depression, psychosis) with (R= >0.7, p = <0.05). In our linguistic analysis, we found that abundance of power words (i.e. superiority, important) and lack of contextual language (i.e. yesterday, nearby) are highly indicative of positive psychotic symptoms with (R= +0.417, p = 0.002) and (R= -0.302, p = 0.028) respectively. Discussion Automated analysis of face, voice, and speech provides a number of robust behavioral markers sensitive enough to detect changes in psychopathology within individuals over time. Therefore, naturalistic, quantitative assessments can yield objective markers of mood and cognition that can be used to optimize both access and quality of treatments for a wide range of psychiatric conditions.

Download Full-text

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research

Language and Speech ◽

10.1177/0023830920911079 ◽

2020 ◽

pp. 002383092091107 ◽

Cited By ~ 1

Author(s):

Suzanne R. Jongman ◽

Yung Han Khoe ◽

Florian Hintz

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Production ◽

Spontaneous Speech ◽

Receptive Vocabulary ◽

Sentence Production ◽

Production Data ◽

Vocabulary Size ◽

Speech Transcription ◽

The Relationship

Previous research has shown that vocabulary size affects performance on laboratory word production tasks. Individuals who know many words show faster lexical access and retrieve more words belonging to pre-specified categories than individuals who know fewer words. The present study examined the relationship between receptive vocabulary size and speaking skills as assessed in a natural sentence production task. We asked whether measures derived from spontaneous responses to everyday questions correlate with the size of participants’ vocabulary. Moreover, we assessed the suitability of automatic speech recognition (ASR) for the analysis of participants’ responses in complex language production data. We found that vocabulary size predicted indices of spontaneous speech: individuals with a larger vocabulary produced more words and had a higher speech-silence ratio compared to individuals with a smaller vocabulary. Importantly, these relationships were reliably identified using manual and automated transcription methods. Taken together, our results suggest that spontaneous speech elicitation is a useful method to investigate natural language production and that automatic speech recognition can alleviate the burden of labor-intensive speech transcription.

Download Full-text

speech transcription
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Utility of the Intelligibility in Context Scale for Predicting Speech Intelligibility of Children with Cerebral Palsy

Revisiting Parity of Human vs. Machine Conversational Speech Transcription

Correcting Automated and Manual Speech Transcription Errors Using Warped Language Models

An Evaluation of Expedited Transcription Methods for School-Age Children's Narrative Language: Automatic Speech Recognition and Real-Time Transcription

Boys-Specific Text-Comprehension Enhancement With Dual Visual-Auditory Text Presentation Among 12–14 Years-Old Students

The Vicomtech Speech Transcription Systems for the Albayzín-RTVE 2020 Speech to Text Transcription Challenge

Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching

Interpersonal functions in Greta Thunberg’s “civil society for rEUnaissance” speech

M111. QUANTITATIVE ASSESSMENT OF MANIA AND PSYCHOSIS DURING HOSPITALIZATION USING AUTOMATED ANALYSIS OF FACE, VOICE, AND LANGUAGE

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research

Export Citation Format

speech transcriptionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Utility of the Intelligibility in Context Scale for Predicting Speech Intelligibility of Children with Cerebral Palsy

Revisiting Parity of Human vs. Machine Conversational Speech Transcription

Correcting Automated and Manual Speech Transcription Errors Using Warped Language Models

An Evaluation of Expedited Transcription Methods for School-Age Children's Narrative Language: Automatic Speech Recognition and Real-Time Transcription

Boys-Specific Text-Comprehension Enhancement With Dual Visual-Auditory Text Presentation Among 12–14 Years-Old Students

The Vicomtech Speech Transcription Systems for the Albayzín-RTVE 2020 Speech to Text Transcription Challenge

Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching

Interpersonal functions in Greta Thunberg’s “civil society for rEUnaissance” speech

M111. QUANTITATIVE ASSESSMENT OF MANIA AND PSYCHOSIS DURING HOSPITALIZATION USING AUTOMATED ANALYSIS OF FACE, VOICE, AND LANGUAGE

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research

speech transcription
Recently Published Documents