scholarly journals The Effectiveness of the Intelligent Speech Technology Towards Encoding RAN Tasks

2021 ◽  
Vol 168 ◽  
pp. S154
Author(s):  
Huadong Liang ◽  
Xin Li ◽  
Mingming Hu
Keyword(s):  
2021 ◽  
pp. 1-11
Author(s):  
J. N. de Boer ◽  
A. E. Voppel ◽  
S. G. Brederoo ◽  
H. G. Schnack ◽  
K. P. Truong ◽  
...  

Abstract Background Clinicians routinely use impressions of speech as an element of mental status examination. In schizophrenia-spectrum disorders, descriptions of speech are used to assess the severity of psychotic symptoms. In the current study, we assessed the diagnostic value of acoustic speech parameters in schizophrenia-spectrum disorders, as well as its value in recognizing positive and negative symptoms. Methods Speech was obtained from 142 patients with a schizophrenia-spectrum disorder and 142 matched controls during a semi-structured interview on neutral topics. Patients were categorized as having predominantly positive or negative symptoms using the Positive and Negative Syndrome Scale (PANSS). Acoustic parameters were extracted with OpenSMILE, employing the extended Geneva Acoustic Minimalistic Parameter Set, which includes standardized analyses of pitch (F0), speech quality and pauses. Speech parameters were fed into a random forest algorithm with leave-ten-out cross-validation to assess their value for a schizophrenia-spectrum diagnosis, and PANSS subtype recognition. Results The machine-learning speech classifier attained an accuracy of 86.2% in classifying patients with a schizophrenia-spectrum disorder and controls on speech parameters alone. Patients with predominantly positive v. negative symptoms could be classified with an accuracy of 74.2%. Conclusions Our results show that automatically extracted speech parameters can be used to accurately classify patients with a schizophrenia-spectrum disorder and healthy controls, as well as differentiate between patients with predominantly positive v. negatives symptoms. Thus, the field of speech technology has provided a standardized, powerful tool that has high potential for clinical applications in diagnosis and differentiation, given its ease of comparison and replication across samples.


2012 ◽  
Vol 54 (2) ◽  
pp. 37-54 ◽  
Author(s):  
Maciej Karpiński

ABSTRACT Maciej Karpiński. The Boundaries of Language: Dealing with Paralinguistic Features. Lingua Posnaniensis, vol. LIV (2)/2012. The Poznań Society for the Advancement of the Arts and Sciences. PL ISSN 0079-4740, ISBN 978-83-7654-252-2, pp. 37-54. The paralinguistic component of communication attracted a great deal of attention from contemporary linguists in the 1960s. The seminal works written then by Trager, Crystal and others had a powerful influence on the concept of paralanguage that lasted for many years. But, with the focus shifting towards the socio-psychological context of communication in the 1970s, the development of spoken corpora and databases and the significant progress in speech technology in the 1980s and 1990s, the need has arisen for a more comprehensive, coherent and formalised - but also flexible - approach to paralinguistic features. This study advances some preliminary proposals for a revised treatment of paralanguage that would meet some of these requirements and provide a conceptual basis for a new system of annotation for paralinguistic features. A range of views on paralinguistic features, which come mostly from the fields of speech prosody and gesture analysis, are briefly discussed. A number of assumptions and postulates are formulated to allow for a more consistent approach to paralinguistic features. The study suggests that there should be more reliance on continua than on binary categorisations of features, that multi-functionality and multimodality should be fully acknowledged and that clear distinctions should be made among the levels of description, and between the properties of speakers and the speech signal itself.


Author(s):  
Tatiana Sineokova ◽  

Disfluency in spontaneous speech is currently a subject of study of specialists working in different fields of knowledge. Different external manifestations of disfluency (hesitation pauses, sound prolongations, pause fillers, articulatory perseverations and lexico-grammatical repetitions, self-corrections, breaks, nonverbal means of information transfer, etc.) are being investigated. They turn out to be a convenient tool for revealing and monitoring the peculiarities of cognitive processes with the help of explicit clearly registered signals occurring in speech under the influence of a number of extralinguistic factors such as the communicative situation, the type of speech (monologic or dialogic), the language of communication (L1 or L2), the emotional state of the speaker, the age, the social status, the diseases impairing speech and mental activity, and others. Further investigation of disfluency makes it possible to solve both a number of fundamental problems connected with modeling of cognitive coding and decoding speech processes and applied tasks connected with adoption of research findings in such fields as developmental pedagogy, psychology, medicine, foreign language training, translation, automatic recognition of speech signal, etc. Up to now, a sufficient number of empirical investigations have been carried out providing a basis for working out particular models which will make it possible, in the long run, to create the overall model of disfluency in spontaneous speech. Conferences and workshops undoubtedly play an important role in uniting the efforts of specialists in this sphere. One of them is the international workshop “Disfluency in Spontaneous Speech (DiSS)” that was first held in 1999. The current problems that were discussed by the participants of the workshop (production and perception speech models, age and clinical factors of disfluency, special difficulties in foreign speech production, including translation, speech technology) may be a useful reference point for researchers working on the issue.


2015 ◽  
Vol 56 (1) ◽  
pp. 59-83
Author(s):  
Dafydd Gibbon ◽  
Katarzyna Klessa ◽  
Jolanta Bachan

AbstractThe study of speech timing, i.e. the duration and speed or tempo of speech events, has increased in importance over the past twenty years, in particular in connection with increased demands for accuracy, intelligibility and naturalness in speech technology, with applications in language teaching and testing, and with the study of speech timing patterns in language typology. H owever, the methods used in such studies are very diverse, and so far there is no accessible overview of these methods. Since the field is too broad for us to provide an exhaustive account, we have made two choices: first, to provide a framework of paradigmatic (classificatory), syntagmatic (compositional) and functional (discourse-oriented) dimensions for duration analysis; and second, to provide worked examples of a selection of methods associated primarily with these three dimensions. Some of the methods which are covered are established state-of-the-art approaches (e.g. the paradigmatic Classification and Regression Trees, CART , analysis), others are discussed in a critical light (e.g. so-called ‘rhythm metrics’). A set of syntagmatic approaches applies to the tokenisation and tree parsing of duration hierarchies, based on speech annotations, and a functional approach describes duration distributions with sociolinguistic variables. Several of the methods are supported by a new web-based software tool for analysing annotated speech data, the Time Group Analyser.


Sign in / Sign up

Export Citation Format

Share Document