Generating synthetic dysarthric speech to overcome dysarthria acoustic data scarcity

Purpose Dysarthria is a neuromotor speech disorder caused by neuromuscular disturbances that affect one or more articulators resulting in unintelligible speech. Though inter-phoneme articulatory variations are well captured by formant frequency-based acoustic features, these variations are expected to be much higher for dysarthric speakers than normal. These substantial variations can be well captured by placing sensors in appropriate articulatory position. This study focuses to determine a set of articulatory sensors and parameters in order to assess articulatory dysfunctions in dysarthric speech. Design/methodology/approach The current work aims to determine significant sensors and parameters associated using motion path and correlation analyzes on the TORGO database of dysarthric speech. Among eight informative sensor channels and six parameters per channel in positional data, the sensors such as tongue middle, back and tip, lower and upper lips and parameters (y, z, φ) are found to contribute significantly toward capturing the articulatory information. Acoustic and positional data analyzes are performed to validate these identified significant sensors. Furthermore, a convolutional neural network-based classifier is developed for both phone-and word-level classification of dysarthric speech using acoustic and positional data. Findings The average phone error rate is observed to be lower, up to 15.54% for positional data when compared with acoustic-only data. Further, word-level classification using a combination of both acoustic and positional information is performed to study that the positional data acquired using significant sensors will boost the performance of classification even for severe dysarthric speakers. Originality/value The proposed work shows that the significant sensors and parameters can be used to assess dysfunctions in dysarthric speech effectively. The articulatory sensor data helps in better assessment than the acoustic data even for severe dysarthric speakers.

Download Full-text

INES MOVIES : A NEW ACOUSTIC DATA ACQUISITION AND PROCESSING SYSTEM

Le Journal de Physique Colloques ◽

10.1051/jphyscol:19902219 ◽

1990 ◽

Vol 51 (C2) ◽

pp. C2-939-C2-942 ◽

Cited By ~ 1

Author(s):

N. DINER ◽

A. WEILL ◽

J. Y. COAIL ◽

J. M. COUDEVILLE

Keyword(s):

Data Acquisition ◽

Processing System ◽

Acoustic Data ◽

Data Acquisition And Processing

Download Full-text

Underwater Acoustic Data Communications for Autonomous Platform Command, Control and Communications

10.21236/ada389546 ◽

2001 ◽

Cited By ~ 1

Author(s):

Michael Shinego ◽

Geoff Edelson ◽

Francine Menas ◽

Michael Richman ◽

Robert Nation

Keyword(s):

Data Communications ◽

Underwater Acoustic ◽

Acoustic Data

Download Full-text

Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model

10.21437/interspeech.2016-776 ◽

2016 ◽

Cited By ~ 5

Author(s):

Myungjong Kim ◽

Jun Wang ◽

Hoirin Kim

Keyword(s):

Speech Recognition ◽

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Leibler Divergence ◽

Dysarthric Speech

Download Full-text

Dysarthric Speech Recognition Based on Deep Metric Learning

10.21437/interspeech.2020-2267 ◽

2020 ◽

Author(s):

Yuki Takashima ◽

Ryoichi Takashima ◽

Tetsuya Takiguchi ◽

Yasuo Ariki

Keyword(s):

Speech Recognition ◽

Metric Learning ◽

Deep Metric Learning ◽

Dysarthric Speech

Download Full-text

Labialization and Prosodic Modulation in Italian Dysarthric Speech by Parkinsonian Speakers: A Preliminary Investigation

10.21437/speechprosody.2020-169 ◽

2020 ◽

Author(s):

Barbara Gili Fivela ◽

Sonia Immacolata d'Apolito ◽

Giorgia Di Prizio

Keyword(s):

Preliminary Investigation ◽

Dysarthric Speech

Download Full-text

Acoustically induced cavity cloud generated by air-gun arrays—Comparing video recordings and acoustic data to modeling

The Journal of the Acoustical Society of America ◽

10.1121/1.5040490 ◽

2018 ◽

Vol 143 (6) ◽

pp. 3383-3393 ◽

Cited By ~ 8

Author(s):

Babak Khodabandeloo ◽

Martin Landrø

Keyword(s):

Acoustic Data ◽

Video Recordings ◽

Air Gun

Download Full-text

Assamese vowels and vowel harmony

Journal of South Asian Languages and Linguistics ◽

10.1515/jsall-2019-2010 ◽

2020 ◽

Vol 6 (2) ◽

pp. 151-183

Author(s):

Diana B. Archangeli ◽

Jonathan Yip

Keyword(s):

Physical Properties ◽

Principal Components Analysis ◽

Principal Components ◽

Vowel Harmony ◽

Language Community ◽

Acoustic Data ◽

Apparent Loss ◽

Root Position ◽

Components Analysis ◽

Open Question

AbstractBased on impressionistic and acoustic data, Assamese is described as having a phonological tongue root harmony system, with blocking by certain phonological configurations and over-application in certain morphological contexts. This study explores physical properties of the patterns using ultrasonic imaging to determine whether the impressionistic descriptions match what speakers actually do. Principal components analysis (PCA) determines that most participants produce a contrast in tongue root position in the appropriate contexts, though there is less of an impact on tongue root with greater distance from the triggering vowel. Analysis uses the root mean squared distance (RMSD) calculation to determine whether both blocking and over-application take effect. The blocking results conform to the impressionistic descriptions. With over-application, [e] and [o] are expected; while some speakers clearly produce these vowels, others articulate a vowel that is indeterminant between the expected [e]/[o] and an unexpected [ɛ]/[ɔ]. No speaker consistently showed the expected tongue root position in all contexts, and some speakers appeared to have lost the contrast entirely, yet all are considered to be speakers of the same dialect of Assamese. Whether this (apparent) loss is a consequence of crude research methodologies or accurately reflects what is happening within the language community remains an open question.

Download Full-text