text to speech
Recently Published Documents


TOTAL DOCUMENTS

1583
(FIVE YEARS 392)

H-INDEX

31
(FIVE YEARS 5)

2022 ◽  
Vol 29 (2) ◽  
pp. 1-42
Author(s):  
Maitraye Das ◽  
Anne Marie Piper ◽  
Darren Gergle

Collaborative writing tools have been used widely in professional and academic organizations for many years. Yet, there has not been much work to improve screen reader access in mainstream collaborative writing tools. This severely affects the way people with vision impairments collaborate in ability-diverse teams. As a step toward addressing this issue, the present article aims at improving screen reader representation of collaborative features such as comments and track changes (i.e., suggested edits). Building on our formative interviews with 20 academics and professionals with vision impairments, we developed auditory representations that indicate comments and edits using non-speech audio (e.g., earcons, tone overlay), multiple text-to-speech voices, and contextual presentation techniques. We then performed a systematic evaluation study with 48 screen reader users that indicated that non-speech audio, changing voices, and contextual presentation can potentially improve writers’ collaboration awareness. We discuss implications of these results for the design of accessible collaborative systems.


2022 ◽  
Vol 186 ◽  
pp. 108439
Author(s):  
Jesus Monge Alvarez ◽  
Holly Francois ◽  
Hosang Sung ◽  
Seungdo Choi ◽  
Jonghoon Jeong ◽  
...  

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

As we all know, listening makes learning easier and interesting than reading. An audiobook is a software that converts text to speech. Though this sounds good, the audiobooks available in the market are not free and feasible for everyone. Added to this, we find that these audiobooks are only meant for fictional stories, novels or comics. A comprehensive review of the available literature shows that very little intensive work was done for image to speech conversion. In this paper, we employ various strategies for the entire process. As an initial step, deep learning techniques are constructed to denoise the images that are fed to the system. This is followed by text extraction with the help of OCR engines. Additional improvements are made to improve the quality of text extraction and post processing spell check mechanism are incorporated for this purpose. Our result analysis demonstrates that with denoising and spell checking, our model has achieved an accuracy of 98.11% when compared to 84.02% without any denoising or spell check mechanism.


Author(s):  
Kartik Tiwari

Abstract: This paper introduces a new text-to-speech presentation from end-to-end (E2E-TTS) using toolkit called ESPnet-TTS, which is an open source extension. ESPnet speech processing tools kit. Various models come under ESPnet TTS TacoTron 2, Transformer TTS, and Fast Speech. This also provides recipes recommended by the Kaldi speech recognition tool kit (ASR). Recipes based on the composition combined with the ESPnet ASR recipe, which provides high performance. This toolkit also provides pre-trained models and samples of all recipes for users to use as a base .It works on TTS-STT and translation features for various indicator languages, with a strong focus on English, Marathi and Hindi. This paper also shows that neural sequence-to-sequence models find the state of the art or near the effects of the art state on existing databases. We also analyze some of the key design challenges that contribute to the development of a multilingual business translation system, which includes processing bilingual business data sets and evaluating multiple translation methods. The test result can be obtained using tokens and these test results show that our models can achieve modern performance compared to the latest LJ Speech tool kit data. Terms of Reference — Open source, end-to-end, text-to-speech


2021 ◽  
Vol 19 (4) ◽  
pp. 585-600
Author(s):  
Саша Курленкова

As conversation analysis shows, all talk is highly collaborative, and meaning is created dialogically and sequentially, via the concerted actions of all the participants involved. In the case of people with communication impairments, the collaborative character of talk is even more manifest. A speaker with dysarthria, for example, may communicate through typing their messages to a text-to-speech communication app or device, using a communication (alphabet) board, or gazing at objects. This article focuses on one type of co-construction effort aimed at helping an augmented speaker to communicate, a process that can be called other-initiated repair. Although this practice is a common way of achieving understanding with people who have communication needs, in some cases repair initiation is used to do more than that. In this paper, I conduct conversation analysis of a video-recording of naturalistic interactions inside a Russian-speaking family involving a 10-year-old girl with dysarthria who communicates with her parents through an eyetracker-controlled computer interface. In this case, her parents use the structural position of repair initiation on the girl’s words not only to clarify the meaning of her message but to continue the preceding polemics over the mom’s birthday present. I argue that although this is just one instance of the use of other-repair in playful communication between family members, The potentiality of providing the type of guessing which aligns with the guesser’s interests is present in other repair sequences. This can be consequential for lives of people with communication needs when done in more official settings. Studying similar repair sequences can help better delineate 'good' scaffolding strategies in co-construction of speech of someone with communication needs.


Author(s):  
Kelly Knollman-Porter ◽  
Jessica A. Brown ◽  
Karen Hux ◽  
Sarah E. Wallace ◽  
Allison Crittenden

Background: Person-centered approaches promote consistent use of supportive technology and feelings of empowerment for people with disabilities. Feature personalization is an aspect of person-centered approaches that can affect the benefit people with aphasia (PWA) derive from using text-to-speech (TTS) technology as a reading support. Aims: This study's primary purpose was to compare the comprehension and processing time of PWA when performing TTS-supported reading with preferred settings for voice, speech output rate, highlighting type, and highlighting color versus unsupported reading. A secondary aim was to examine initial support and feature preference selections, preference changes following TTS exposure, and anticipated functional reading activities for utilizing TTS technology. Method and Procedure: Twenty PWA read passages either via written text or text combined with TTS output using personally selected supports and features. Participants answered comprehension questions, reevaluated their preference selections, and provided feedback both about feature selections and possible future TTS technology uses. Outcomes and Results: Comprehension accuracy did not vary significantly between reading conditions; however, processing time was significantly less in the TTS-supported condition, thus suggesting TTS support promoted greater reading speed without compromising comprehension. Most participants preferred the TTS condition and several anticipated benefits when reading lengthy and difficult materials. Alterations to initial settings were relatively rare. Conclusions: Personalizing TTS systems is relevant to person-centered interventions. Reading with desired TTS system supports and features promotes improved reading efficiency by PWA compared with reading without TTS support. Attending to client preferences is important when customizing and implementing TTS technology as a reading support.


2021 ◽  
pp. 126-131
Author(s):  
Yue He ◽  
Walcir Cardoso

This study investigated whether a translation tool (Microsoft Translator – MT) and its built-in speech features (Text-To-Speech synthesis – TTS – and speech recognition) can promote learners’ acquisition in pronunciation of English regular past tense -ed in a self-directed manner. Following a pretest/posttest design, we compared 29 participants’ performances of past -ed allomorphy (/t/, /d/, and /id/) by assessing their pronunciation in terms of phonological awareness, phonemic discrimination, and oral production. The findings highlight the affordances of MT regarding its pedagogical use for helping English as a Foreign Language (EFL) learners improve their pronunciation.


Sign in / Sign up

Export Citation Format

Share Document