High Predictive Accuracy of Negative Schizotypy With Acoustic Measures

2021 ◽  
pp. 216770262110178
Author(s):  
Alex S. Cohen ◽  
Christopher R. Cox ◽  
Tovah Cowan ◽  
Michael D. Masucci ◽  
Thanh P. Le ◽  
...  

Negative schizotypal traits potentially can be digitally phenotyped using objective vocal analysis. Prior attempts have shown mixed success in this regard, potentially because acoustic analysis has relied on small, constrained feature sets. We employed machine learning to (a) optimize and cross-validate predictive models of self-reported negative schizotypy using a large acoustic feature set, (b) evaluate model performance as a function of sex and speaking task, (c) understand potential mechanisms underlying negative schizotypal traits by evaluating the key acoustic features within these models, and (d) examine model performance in its convergence with clinical symptoms and cognitive functioning. Accuracy was good (> 80%) and was improved by considering speaking task and sex. However, the features identified as most predictive of negative schizotypal traits were generally not considered critical to their conceptual definitions. Implications for validating and implementing digital phenotyping to understand and quantify negative schizotypy are discussed.

2018 ◽  
Vol 7 (2.16) ◽  
pp. 98 ◽  
Author(s):  
Mahesh K. Singh ◽  
A K. Singh ◽  
Narendra Singh

This paper emphasizes an algorithm that is based on acoustic analysis of electronics disguised voice. Proposed work is given a comparative analysis of all acoustic feature and its statistical coefficients. Acoustic features are computed by Mel-frequency cepstral coefficients (MFCC) method and compare with a normal voice and disguised voice by different semitones. All acoustic features passed through the feature based classifier and detected the identification rate of all type of electronically disguised voice. There are two types of support vector machine (SVM) and decision tree (DT) classifiers are used for speaker identification in terms of classification efficiency of electronically disguised voice by different semitones.  


1992 ◽  
Vol 35 (2) ◽  
pp. 296-308 ◽  
Author(s):  
Beth M. Ansel ◽  
Raymond D. Kent

This study evaluated the relationship between specific acoustic features of speech and perceptual judgments of word intelligibility of adults with cerebral palsy-dysarthria. Use of a contrasting word task allowed for intelligibility analysis and correlated acoustic analysis according to specified spectral and temporal features. Selected phonemic contrasts included syllable-initial voicing; syllable-final voicing; stop-nasal; fricative-affricate; front-back, high-low, and tense-lax vowels. Speech materials included a set of CVC stimulus words. Acoustic data are reported on vowel duration, formant frequency locations, voice onset times, amplitude rise times, and frication durations. Listeners’ perceptual assessment of intelligibility of the 16 dysarthric adults by transcription and rating tasks is also presented. All but one acoustic contrast was successfully made as evidenced by measured acoustic differences between contrast pairs. However, the generally successful acoustic contrasts stood in marked contrast to the poorly rated intelligibility scores and high error percentages that were ascribed to the opposite pair members. A second analysis examined the contribution of these acoustic features towards estimates and prediction of intelligibility deficits in speakers with dysarthria. The scaled intelligibility was predicted by multiple regression analysis with 62.6% accuracy by acoustic measures related to one consonant contrast (fricative-affricate) and three vowel contrasts (front-back, high-low, and tense-lax). Other measured contrasts, such as those related to contrast voicing effects and stop-nasal distinctions, did not seem to contribute in a significant way to variability in the intelligibility estimates. These findings are discussed in relation to specific areas of production deficiency that are consistent across different types of dysarthria with cerebral palsy as the etiology.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Alex S. Cohen ◽  
Christopher R. Cox ◽  
Thanh P. Le ◽  
Tovah Cowan ◽  
Michael D. Masucci ◽  
...  

Abstract Negative symptoms are a transdiagnostic feature of serious mental illness (SMI) that can be potentially “digitally phenotyped” using objective vocal analysis. In prior studies, vocal measures show low convergence with clinical ratings, potentially because analysis has used small, constrained acoustic feature sets. We sought to evaluate (1) whether clinically rated blunted vocal affect (BvA)/alogia could be accurately modelled using machine learning (ML) with a large feature set from two separate tasks (i.e., a 20-s “picture” and a 60-s “free-recall” task), (2) whether “Predicted” BvA/alogia (computed from the ML model) are associated with demographics, diagnosis, psychiatric symptoms, and cognitive/social functioning, and (3) which key vocal features are central to BvA/Alogia ratings. Accuracy was high (>90%) and was improved when computed separately by speaking task. ML scores were associated with poor cognitive performance and social functioning and were higher in patients with schizophrenia versus depression or mania diagnoses. However, the features identified as most predictive of BvA/Alogia were generally not considered critical to their operational definitions. Implications for validating and implementing digital phenotyping to reduce SMI burden are discussed.


2011 ◽  
Vol 48 (6) ◽  
pp. 695-707 ◽  
Author(s):  
Youkyung Bae ◽  
David P. Kuehn ◽  
Charles A. Conway ◽  
Bradley P. Sutton

Objective To examine the relationships between acoustic and physiologic aspects of the velopharyngeal mechanism during acoustically nasalized segments of speech in normal individuals by combining fast magnetic resonance imaging (MRI) with simultaneous speech recordings and subsequent acoustic analyses. Design Ten normal Caucasian adult individuals participated in the study. M id sagittal dynamic magnetic resonance imaging (MRI) and simultaneous speech recordings were performed while participants were producing repetitions of two rate-controlled nonsense syllables including /zanaza/ and /zunuzu/. Acoustic features of nasalization represented as the peak amplitude and the bandwidth of the first resonant frequency (F1) were derived from speech at the rate of 30 sets per second. Physiologic information was based on velar and tongue positional changes measured from the dynamic MRI data, which were acquired at a rate of 21.4 images per second and resampled with a corresponding rate of 30 images per second. Each acoustic feature of nasalization was regressed on gender, vowel context, and velar and tongue positional variables. Results Acoustic features of nasalization represented by F1 peak amplitude and bandwidth changes were significantly influenced by the vowel context surrounding the nasal consonant, velar elevated position, and tongue height at the tip. Conclusions Fast MRI combined with acoustic analysis was successfully applied to the investigation of acoustic-physiologic relationships of the velopharyngeal mechanism with the type of speech samples employed in the present study. Future applications are feasible to examine how anatomic and physiologic deviations of the velopharyngeal mechanism would be acoustically manifested in individuals with velopharyngeal incompetence.


2017 ◽  
Vol 23 (1) ◽  
pp. 1-20
Author(s):  
Kathy Connaughton ◽  
Irena Yanushevskaya

Objective: This study explores the immediate impact of prolonged voice use by professional sports coaches. Method: Speech samples including sustained phonation of vowel /a/ and a short read passage were collected from two professional sports coaches. The audio recordings were made within an hour before and after a coaching session, over three sessions. Perceptual evaluation of voice quality was done using the GRBAS scale. The speech samples were subsequently analyzed using Praat. The acoustic measures included fundamental frequency (f0), jitter, shimmer, Harmonics-to-Noise ratio and Cepstral Peak Prominence. Main results: The results of perceptual and acoustic analysis suggest a slight shift towards a tenser phonation post-coaching session, which is a likely consequence of laryngeal muscle adaptation to prolonged voice use. This tendency was similar in sustained vowels and connected speech. Conclusion: Acoustic measures used in this study can be useful to capture the voice change post-coaching session. It is desirable, however, that more sophisticated and robust and at the same time intuitive and easy-to-use tools for voice assessment and monitoring be made available to clinicians and professional voice users.


2009 ◽  
Vol 03 (02) ◽  
pp. 209-234 ◽  
Author(s):  
YI YU ◽  
KAZUKI JOE ◽  
VINCENT ORIA ◽  
FABIAN MOERCHEN ◽  
J. STEPHEN DOWNIE ◽  
...  

Research on audio-based music retrieval has primarily concentrated on refining audio features to improve search quality. However, much less work has been done on improving the time efficiency of music audio searches. Representing music audio documents in an indexable format provides a mechanism for achieving efficiency. To address this issue, in this work Exact Locality Sensitive Mapping (ELSM) is suggested to join the concatenated feature sets and soft hash values. On this basis we propose audio-based music indexing techniques, ELSM and Soft Locality Sensitive Hash (SoftLSH) using an optimized Feature Union (FU) set of extracted audio features. Two contributions are made here. First, the principle of similarity-invariance is applied in summarizing audio feature sequences and utilized in training semantic audio representations based on regression. Second, soft hash values are pre-calculated to help locate the searching range more accurately and improve collision probability among features similar to each other. Our algorithms are implemented in a demonstration system to show how to retrieve and evaluate multi-version audio documents. Experimental evaluation over a real "multi-version" audio dataset confirms the practicality of ELSM and SoftLSH with FU and proves that our algorithms are effective for both multi-version detection (online query, one-query vs. multi-object) and same content detection (batch queries, multi-queries vs. one-object).


2019 ◽  
Vol 62 (1) ◽  
pp. 60-69
Author(s):  
Areen Badwal ◽  
JoHanna Poertner ◽  
Robin A. Samlan ◽  
Julie E. Miller

Purpose The zebra finch is used as a model to study the neural circuitry of auditory-guided human vocal production. The terminology of birdsong production and acoustic analysis, however, differs from human voice production, making it difficult for voice researchers of either species to navigate the literature from the other. The purpose of this research note is to identify common terminology and measures to better compare information across species. Method Terminology used in the birdsong literature will be mapped onto terminology used in the human voice production literature. Measures typically used to quantify the percepts of pitch, loudness, and quality will be described. Measures common to the literature in both species will be made from the songs of 3 middle-age birds using Praat and Song Analysis Pro. Two measures, cepstral peak prominence (CPP) and Wiener entropy (WE), will be compared to determine if they provide similar information. Results Similarities and differences in terminology and acoustic analyses are presented. A core set of measures including frequency, frequency variability within a syllable, intensity, CPP, and WE are proposed for future studies. CPP and WE are related yet provide unique information about the syllable structure. Conclusions Using a core set of measures familiar to both human voice and birdsong researchers, along with both CPP and WE, will allow characterization of similarities and differences among birds. Standard terminology and measures will improve accessibility of the birdsong literature to human voice researchers and vice versa. Supplemental Material https://doi.org/10.23641/asha.7438964


2018 ◽  
Vol 61 (1) ◽  
pp. 40-51 ◽  
Author(s):  
Dhanshree R. Gunjawate ◽  
Rohit Ravi ◽  
Rajashekhar Bellur

Purpose Singers are vocal athletes having specific demands from their voice and require special consideration during voice evaluation. Presently, there is a lack of standards for acoustic evaluation in them. The aim of the present study was to systematically review the available literature on the acoustic analysis of voice in singers. Method A systematic review of studies on acoustic analysis of voice in singers (PubMed/MEDLINE, CINAHL, Scopus, ProQuest, Cochrane, Ovid, Science Direct, and Shodhganga) was carried out. Key words based on PIO (population–investigation–outcome) were used to develop search strings. Titles and abstracts were screened independently, and appropriate studies were read in full for data extraction. Results Of the 895 studies, 26 studies met the inclusion criteria. Great variability was noted in the instruments and task used. Different acoustic measures were employed, such as fundamental frequency, perturbation, cepstral, spectral, dysphonia severity index, singing power ratio, and so forth. Conclusion Overall, a great heterogeneity was noted regarding population, tasks, instruments, and parameters. There is a lack of standardized criteria for the evaluation of singing voice. In order to implement acoustic analysis as a part of comprehensive voice evaluation exclusively for singers, there is a certain need for methodical sound studies.


Author(s):  
Genaro Daza ◽  
Luis Gonzalo Sánchez ◽  
Franklin A. Sepúlveda ◽  
Castellanos D. Germán

The present work analyzes the statistical effectiveness of different acoustic features in the automatic identification of hypernasality. Acoustic features reflect part of information contained in perceptual analysis; in part, due to their estimation is derived directly or indirectly from the vocal cords behavior. Consequently, it is convenient to apply multivariate analysis techniques in determining the effectiveness of voice features. The effectiveness is studied by using multivariate analysis techniques that are meant for feature extraction and feature selection, as well (latent variable models, heuristic search algorithms).


2019 ◽  
Vol 27 (1) ◽  
pp. 119-126 ◽  
Author(s):  
Lingjiao Zhang ◽  
Xiruo Ding ◽  
Yanyuan Ma ◽  
Naveen Muthu ◽  
Imran Ajmal ◽  
...  

Abstract Objective Phenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls. Materials and Methods Our framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotyping models, and compare model performance with existing algorithms. Results Our method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled. Discussion Upon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models. Conclusions Our proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.


Sign in / Sign up

Export Citation Format

Share Document