speech technology
Recently Published Documents


TOTAL DOCUMENTS

349
(FIVE YEARS 57)

H-INDEX

13
(FIVE YEARS 3)

Author(s):  
Kelly Knollman-Porter ◽  
Jessica A. Brown ◽  
Karen Hux ◽  
Sarah E. Wallace ◽  
Allison Crittenden

Background: Person-centered approaches promote consistent use of supportive technology and feelings of empowerment for people with disabilities. Feature personalization is an aspect of person-centered approaches that can affect the benefit people with aphasia (PWA) derive from using text-to-speech (TTS) technology as a reading support. Aims: This study's primary purpose was to compare the comprehension and processing time of PWA when performing TTS-supported reading with preferred settings for voice, speech output rate, highlighting type, and highlighting color versus unsupported reading. A secondary aim was to examine initial support and feature preference selections, preference changes following TTS exposure, and anticipated functional reading activities for utilizing TTS technology. Method and Procedure: Twenty PWA read passages either via written text or text combined with TTS output using personally selected supports and features. Participants answered comprehension questions, reevaluated their preference selections, and provided feedback both about feature selections and possible future TTS technology uses. Outcomes and Results: Comprehension accuracy did not vary significantly between reading conditions; however, processing time was significantly less in the TTS-supported condition, thus suggesting TTS support promoted greater reading speed without compromising comprehension. Most participants preferred the TTS condition and several anticipated benefits when reading lengthy and difficult materials. Alterations to initial settings were relatively rare. Conclusions: Personalizing TTS systems is relevant to person-centered interventions. Reading with desired TTS system supports and features promotes improved reading efficiency by PWA compared with reading without TTS support. Attending to client preferences is important when customizing and implementing TTS technology as a reading support.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Shuli Wang ◽  
Xiuchuan Shi

In order to improve the pronunciation accuracy of spoken English reading, this paper combines artificial intelligence technology to construct a correction model of the spoken pronunciation accuracy of AI virtual English reading. Moreover, this paper analyzes the process of speech synthesis with intelligent speech technology, proposes a statistical parametric speech based on hidden Markov chains, and improves the system algorithm to make it an intelligent algorithm that meets the requirements of the correction system of spoken pronunciation accuracy of AI virtual English reading. Finally, this paper combines the simulation research to analyze the English reading, spoken pronunciation, and pronunciation correction of the intelligent system. From the experimental research results, the correction system of spoken pronunciation accuracy of AI virtual English reading proposed in this paper basically meets the basic needs of this paper to build a system.


2021 ◽  
pp. 1-17
Author(s):  
Sethuram V ◽  
Ande Prasad ◽  
R. Rajeswara Rao

In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and Speaker Segmentation and Clustering process. Initially, from the input audio stream (Telugu language) collected, the Mel Frequency Cepstral coefficient (MFCC) based features are extracted. Subsequently, in Speech Activity Detection (SAD), the music and silence signals are removed. Then, the acquired speech signals are segmented for each individual speaker. Finally, the segmented signals are subjected to the speaker clustering process, where the Optimized Convolutional Neural Network (CNN) is used. To make the clustering more appropriate, the weight and activation function of CNN are fine-tuned by a new Self Adaptive Sea Lion Algorithm (SA-SLnO). Finally, a comparative analysis is made to exhibit the superiority of the proposed speaker diarization work. Accordingly, the accuracy of the proposed method is 0.8073, which is 5.255, 2.45%, and 0.075, superior to the existing works.


2021 ◽  
Vol 72 (2) ◽  
pp. 579-589
Author(s):  
Róbert Sabo ◽  
Štefan Beňuš ◽  
Marian Trnka ◽  
Marian Ritomský ◽  
Milan Rusko ◽  
...  

Abstract The paper describes methodology for creating a Slovak database of speech under stress and pilot observations. While the relationship between stress and speech characteristics can be utilized in a wide domain of speech technology applications, its research suffers from the lack of suitable databases, particularly in conversational speech. We propose a novel procedure to record acted speech in the home of actors and using their own smartphones. We describe both the collection of speech material under three levels of stress and the subsequent annotation of stress levels in this material. First observations suggest a reasonable inter-annotator agreement, as well as interesting avenues for the relationship between the intended stress levels and those perceived in speech.


2021 ◽  
Vol 6 (2) ◽  
pp. 5135
Author(s):  
Reed Blaylock

I used Backward Design to scaffold ten weeks of assignments that taught students how to perform sine wave vowel synthesis and a Fourier transformation approximation using just a few fundamental programming concepts. This strategy gave all students, regardless of their previous programming experience, the opportunity to implement algorithms related to core concepts in phonetics and speech technology. Reflecting on the course, it seems that the coding assignments were generally well-received by students and contributed to students programming something complex and meaningful.


2021 ◽  
Author(s):  
Karl Wettin

Wikispeech is a free and open text-to-speech (TTS) solution that runs on MediaWiki. Wikispeech will make the Wikimedia projects speak – for anyone, illiterate, blind, or just belonging to the quarter of the world's population who prefer learning from listening rather than reading.  In the true Wikimedia fashion, volunteers will be able to improve the quality of Wikispeech. Errors and flaws can be corrected, and in the long run, new voices and languages can be added.  As part of the project, tools for collecting speech data will be developed. With this data, new voices can be created. And both the tools and the data will of course be released under a free license, so that they can be used in other speech technology projects too.


Author(s):  
Dávid Sztahó ◽  
György Szaszák ◽  
András Beke

This paper reviews the applied Deep Learning (DL) practices in the field of Speaker Recognition (SR), both in verification and identification. Speaker Recognition has been a widely used topic of speech technology. Many research works have been carried out and little progress has been achieved in the past 5–6 years. However, as Deep Learning techniques do advance in most machine learning fields, the former state-of-the-art methods are getting replaced by them in Speaker Recognition too. It seems that Deep Learning becomes the now state-of-the-art solution for both Speaker Verification (SV) and identification. The standard x-vectors, additional to i-vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to Deep Learning, where they are the most effective.


2021 ◽  
pp. 1-11
Author(s):  
J. N. de Boer ◽  
A. E. Voppel ◽  
S. G. Brederoo ◽  
H. G. Schnack ◽  
K. P. Truong ◽  
...  

Abstract Background Clinicians routinely use impressions of speech as an element of mental status examination. In schizophrenia-spectrum disorders, descriptions of speech are used to assess the severity of psychotic symptoms. In the current study, we assessed the diagnostic value of acoustic speech parameters in schizophrenia-spectrum disorders, as well as its value in recognizing positive and negative symptoms. Methods Speech was obtained from 142 patients with a schizophrenia-spectrum disorder and 142 matched controls during a semi-structured interview on neutral topics. Patients were categorized as having predominantly positive or negative symptoms using the Positive and Negative Syndrome Scale (PANSS). Acoustic parameters were extracted with OpenSMILE, employing the extended Geneva Acoustic Minimalistic Parameter Set, which includes standardized analyses of pitch (F0), speech quality and pauses. Speech parameters were fed into a random forest algorithm with leave-ten-out cross-validation to assess their value for a schizophrenia-spectrum diagnosis, and PANSS subtype recognition. Results The machine-learning speech classifier attained an accuracy of 86.2% in classifying patients with a schizophrenia-spectrum disorder and controls on speech parameters alone. Patients with predominantly positive v. negative symptoms could be classified with an accuracy of 74.2%. Conclusions Our results show that automatically extracted speech parameters can be used to accurately classify patients with a schizophrenia-spectrum disorder and healthy controls, as well as differentiate between patients with predominantly positive v. negatives symptoms. Thus, the field of speech technology has provided a standardized, powerful tool that has high potential for clinical applications in diagnosis and differentiation, given its ease of comparison and replication across samples.


2021 ◽  
Vol 2021 (1) ◽  
pp. 10568
Author(s):  
Mingang K. Geiger ◽  
Mike Horia Teodorescu ◽  
Lily Morse
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document