text to speech Latest Research Papers

Design and Evaluation of Accessible Collaborative Writing Techniques for People with Vision Impairments

ACM Transactions on Computer-Human Interaction ◽

10.1145/3480169 ◽

2022 ◽

Vol 29 (2) ◽

pp. 1-42

Author(s):

Maitraye Das ◽

Anne Marie Piper ◽

Darren Gergle

Keyword(s):

Present Article ◽

Collaborative Writing ◽

Evaluation Study ◽

Systematic Evaluation ◽

Collaborative Systems ◽

Text To Speech ◽

Screen Reader ◽

Academic Organizations ◽

Collaboration Awareness ◽

Writing Tools

Collaborative writing tools have been used widely in professional and academic organizations for many years. Yet, there has not been much work to improve screen reader access in mainstream collaborative writing tools. This severely affects the way people with vision impairments collaborate in ability-diverse teams. As a step toward addressing this issue, the present article aims at improving screen reader representation of collaborative features such as comments and track changes (i.e., suggested edits). Building on our formative interviews with 20 academics and professionals with vision impairments, we developed auditory representations that indicate comments and edits using non-speech audio (e.g., earcons, tone overlay), multiple text-to-speech voices, and contextual presentation techniques. We then performed a systematic evaluation study with 48 screen reader users that indicated that non-speech audio, changing voices, and contextual presentation can potentially improve writers’ collaboration awareness. We discuss implications of these results for the design of accessible collaborative systems.

CAMNet: A controllable acoustic model for efficient, expressive, high-quality text-to-speech

Applied Acoustics ◽

10.1016/j.apacoust.2021.108439 ◽

2022 ◽

Vol 186 ◽

pp. 108439

Author(s):

Jesus Monge Alvarez ◽

Holly Francois ◽

Hosang Sung ◽

Seungdo Choi ◽

Jonghoon Jeong ◽

...

Keyword(s):

Acoustic Model ◽

Text To Speech ◽

High Quality

Transfer Learning, Style Control, and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages

IEEE Access ◽

10.1109/access.2022.3141200 ◽

2022 ◽

pp. 1-1

Author(s):

Kurniawati Azizah ◽

Wisnu Jatmiko

Keyword(s):

Learning Style ◽

Transfer Learning ◽

Text To Speech ◽

Low Resource

Developing E-learning Module by Using Text-To-Speech (TTS) in Telegram Bot for Listening Comprehension

10.2991/assehr.k.211227.007 ◽

2022 ◽

Author(s):

Dewi Masitho Istiqomah ◽

Rasyidah Nur Aisyah ◽

Ana Ahsana El-Sulukiyyah

Keyword(s):

Listening Comprehension ◽

Text To Speech ◽

Learning Module ◽

E Learning

Shared Syllables for Amharic Tigrigna Text to Speech Synthesis

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advances of Science and Technology ◽

10.1007/978-3-030-93709-6_37 ◽

2022 ◽

pp. 550-558

Author(s):

Lemlem Hagos ◽

Million Meshesha ◽

Solomon Atnafu ◽

Solomon Teferra

Keyword(s):

Speech Synthesis ◽

Text To Speech ◽

Text To Speech Synthesis

An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289570 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Initial Step ◽

Post Processing ◽

Text To Speech ◽

Comprehensive Review ◽

Text Extraction ◽

Learning Techniques ◽

Spell Check ◽

Result Analysis ◽

Intensive Work

As we all know, listening makes learning easier and interesting than reading. An audiobook is a software that converts text to speech. Though this sounds good, the audiobooks available in the market are not free and feasible for everyone. Added to this, we find that these audiobooks are only meant for fictional stories, novels or comics. A comprehensive review of the available literature shows that very little intensive work was done for image to speech conversion. In this paper, we employ various strategies for the entire process. As an initial step, deep learning techniques are constructed to denoise the images that are fed to the system. This is followed by text extraction with the help of OCR engines. Additional improvements are made to improve the quality of text extraction and post processing spell check mechanism are incorporated for this purpose. Our result analysis demonstrates that with denoising and spell checking, our model has achieved an accuracy of 98.11% when compared to 84.02% without any denoising or spell check mechanism.

Deep Learning Based TTS-STT Model with Transliteration for Indic Languages

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39689 ◽

2021 ◽

Vol 9 (12) ◽

pp. 2207-2213

Author(s):

Kartik Tiwari

Keyword(s):

Open Source ◽

Speech Processing ◽

High Performance ◽

Translation System ◽

Data Sets ◽

Text To Speech ◽

Strong Focus ◽

End To End ◽

Translation Methods ◽

Speech Presentation

Abstract: This paper introduces a new text-to-speech presentation from end-to-end (E2E-TTS) using toolkit called ESPnet-TTS, which is an open source extension. ESPnet speech processing tools kit. Various models come under ESPnet TTS TacoTron 2, Transformer TTS, and Fast Speech. This also provides recipes recommended by the Kaldi speech recognition tool kit (ASR). Recipes based on the composition combined with the ESPnet ASR recipe, which provides high performance. This toolkit also provides pre-trained models and samples of all recipes for users to use as a base .It works on TTS-STT and translation features for various indicator languages, with a strong focus on English, Marathi and Hindi. This paper also shows that neural sequence-to-sequence models find the state of the art or near the effects of the art state on existing databases. We also analyze some of the key design challenges that contribute to the development of a multilingual business translation system, which includes processing bilingual business data sets and evaluating multiple translation methods. The test result can be obtained using tokens and these test results show that our models can achieve modern performance compared to the latest LJ Speech tool kit data. Terms of Reference — Open source, end-to-end, text-to-speech

Who Speaks with My Synthesized Voice? Some Roles of Other-initiated Repair in Augmented Intaraction

The Journal of Social Policy Studies ◽

10.17323/727-0634-2021-19-4-585-600 ◽

2021 ◽

Vol 19 (4) ◽

pp. 585-600

Author(s):

Саша Курленкова

Keyword(s):

Conversation Analysis ◽

Video Recording ◽

Family Members ◽

Computer Interface ◽

Speech Communication ◽

Text To Speech ◽

Structural Position ◽

Communication Impairments ◽

Communication Needs ◽

Scaffolding Strategies

As conversation analysis shows, all talk is highly collaborative, and meaning is created dialogically and sequentially, via the concerted actions of all the participants involved. In the case of people with communication impairments, the collaborative character of talk is even more manifest. A speaker with dysarthria, for example, may communicate through typing their messages to a text-to-speech communication app or device, using a communication (alphabet) board, or gazing at objects. This article focuses on one type of co-construction effort aimed at helping an augmented speaker to communicate, a process that can be called other-initiated repair. Although this practice is a common way of achieving understanding with people who have communication needs, in some cases repair initiation is used to do more than that. In this paper, I conduct conversation analysis of a video-recording of naturalistic interactions inside a Russian-speaking family involving a 10-year-old girl with dysarthria who communicates with her parents through an eyetracker-controlled computer interface. In this case, her parents use the structural position of repair initiation on the girl’s words not only to clarify the meaning of her message but to continue the preceding polemics over the mom’s birthday present. I argue that although this is just one instance of the use of other-repair in playful communication between family members, The potentiality of providing the type of guessing which aligns with the guesser’s interests is present in other repair sequences. This can be consequential for lives of people with communication needs when done in more official settings. Studying similar repair sequences can help better delineate 'good' scaffolding strategies in co-construction of speech of someone with communication needs.

Reading Comprehension and Processing Time When People With Aphasia Use Text-to-Speech Technology With Personalized Supports and Features

American Journal of Speech-Language Pathology ◽

10.1044/2021_ajslp-21-00182 ◽

2021 ◽

pp. 1-17

Author(s):

Kelly Knollman-Porter ◽

Jessica A. Brown ◽

Karen Hux ◽

Sarah E. Wallace ◽

Allison Crittenden

Keyword(s):

Processing Time ◽

Reading Speed ◽

Text To Speech ◽

Speech Technology ◽

Written Text ◽

Reading Support ◽

Reading Efficiency ◽

Initial Support ◽

Initial Settings ◽

Reading Activities

Background: Person-centered approaches promote consistent use of supportive technology and feelings of empowerment for people with disabilities. Feature personalization is an aspect of person-centered approaches that can affect the benefit people with aphasia (PWA) derive from using text-to-speech (TTS) technology as a reading support. Aims: This study's primary purpose was to compare the comprehension and processing time of PWA when performing TTS-supported reading with preferred settings for voice, speech output rate, highlighting type, and highlighting color versus unsupported reading. A secondary aim was to examine initial support and feature preference selections, preference changes following TTS exposure, and anticipated functional reading activities for utilizing TTS technology. Method and Procedure: Twenty PWA read passages either via written text or text combined with TTS output using personally selected supports and features. Participants answered comprehension questions, reevaluated their preference selections, and provided feedback both about feature selections and possible future TTS technology uses. Outcomes and Results: Comprehension accuracy did not vary significantly between reading conditions; however, processing time was significantly less in the TTS-supported condition, thus suggesting TTS support promoted greater reading speed without compromising comprehension. Most participants preferred the TTS condition and several anticipated benefits when reading lengthy and difficult materials. Alterations to initial settings were relatively rare. Conclusions: Personalizing TTS systems is relevant to person-centered interventions. Reading with desired TTS system supports and features promotes improved reading efficiency by PWA compared with reading without TTS support. Attending to client preferences is important when customizing and implementing TTS technology as a reading support.

Can online translators and their speech capabilities help English learners improve their pronunciation?

10.14705/rpnet.2021.54.1320 ◽

2021 ◽

pp. 126-131

Author(s):

Yue He ◽

Walcir Cardoso

Keyword(s):

Speech Recognition ◽

Phonological Awareness ◽

English Learners ◽

Speech Synthesis ◽

Text To Speech ◽

Efl Learners ◽

Oral Production ◽

Text To Speech Synthesis ◽

Speech Features ◽

Online Translators

This study investigated whether a translation tool (Microsoft Translator – MT) and its built-in speech features (Text-To-Speech synthesis – TTS – and speech recognition) can promote learners’ acquisition in pronunciation of English regular past tense -ed in a self-directed manner. Following a pretest/posttest design, we compared 29 participants’ performances of past -ed allomorphy (/t/, /d/, and /id/) by assessing their pronunciation in terms of phonological awareness, phonemic discrimination, and oral production. The findings highlight the affordances of MT regarding its pedagogical use for helping English as a Foreign Language (EFL) learners improve their pronunciation.

text to speech
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Design and Evaluation of Accessible Collaborative Writing Techniques for People with Vision Impairments

CAMNet: A controllable acoustic model for efficient, expressive, high-quality text-to-speech

Transfer Learning, Style Control, and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages

Developing E-learning Module by Using Text-To-Speech (TTS) in Telegram Bot for Listening Comprehension

Shared Syllables for Amharic Tigrigna Text to Speech Synthesis

An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook

Deep Learning Based TTS-STT Model with Transliteration for Indic Languages

Who Speaks with My Synthesized Voice? Some Roles of Other-initiated Repair in Augmented Intaraction

Reading Comprehension and Processing Time When People With Aphasia Use Text-to-Speech Technology With Personalized Supports and Features

Can online translators and their speech capabilities help English learners improve their pronunciation?

Export Citation Format

text to speechRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Design and Evaluation of Accessible Collaborative Writing Techniques for People with Vision Impairments

CAMNet: A controllable acoustic model for efficient, expressive, high-quality text-to-speech

Transfer Learning, Style Control, and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages

Developing E-learning Module by Using Text-To-Speech (TTS) in Telegram Bot for Listening Comprehension

Shared Syllables for Amharic Tigrigna Text to Speech Synthesis

An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook

Deep Learning Based TTS-STT Model with Transliteration for Indic Languages

Who Speaks with My Synthesized Voice? Some Roles of Other-initiated Repair in Augmented Intaraction

Reading Comprehension and Processing Time When People With Aphasia Use Text-to-Speech Technology With Personalized Supports and Features

Can online translators and their speech capabilities help English learners improve their pronunciation?

text to speech
Recently Published Documents