synthetic voices
Recently Published Documents


TOTAL DOCUMENTS

48
(FIVE YEARS 13)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Gareth Morlais

When you're making plans to get people using your language as much and as often as possible, there's a list of things related to Wikipedia which can really help. I'll share our experience with the Welsh language. Supporting the Welsh-language Wikipedia community forms Work Package 15 of 27 in the Welsh Government's Welsh Language Technology Action Plan https://gov.wales/sites/default/files/publications/2018-12/welsh-language-technology-and-digital-media-action-plan.pdf. We like supporting Welsh language Wikipedia editing workshops, video workshops and other channels that encourage people to create and publish Welsh-language video, audio, graphic and text content because we're on a mission to try to help double daily use of Welsh by 2050. I'll share developments we're funding in speech, translation and conversational AI. The partners we're giving grants to publish what they develop under open licence. So we can share what we've funded with many companies. We think Microsoft might have used some to make their new synthetic voices in Welsh. We're excited by the potential Wikidata offers. We'll look at its potential in populating Welsh maps this year. We've already used Wikipedia search data as a way of prioritising the training of a Welsh virtual assistant. Welsh may not be spending as much as Icelandic and Estonian do on language technologies, but we'd like to share what we're learning as a smaller language about the important areas to focus on and how Wikipedia can help.


Author(s):  
Ben Noah ◽  
Arathi Sethumadhavan ◽  
Josh Lovejoy ◽  
David Mondello

Text-to-Speech (TTS) technologies have provided ways to produce acoustic approximations of human voices. However, recent advancements in machine learning (i.e., neural network TTS) have helped move beyond coarse mimicry and towards more natural-sounding speech. With only a small collection of recorded utterances, it is now possible to generate wholly synthetic voices indistinguishable from those of human speakers. While these new approaches to speech synthesis can help facilitate more seamless experiences with artificial agents, they also lower the barrier to entry for those seeking to perpetrate deception. As such, in the development of these technologies, it is important to anticipate potential harms and devise strategies to help mitigate against misuse. This paper presents findings from a 360-person survey that assessed public perceptions of synthetic voices, with a particular focus on how voice type and social scenarios impact ratings of trust. Findings have implications for the responsible deployment of synthetic speech technologies.


2021 ◽  
pp. 146144482110241
Author(s):  
Emma Rodero ◽  
Ignacio Lucas

Human voices narrate most audiobooks, but the fast development of speech synthesis technology has enabled the possibility of using artificial voices. This raises the question of whether the listeners’ cognitive processing is the same when listening to a synthetic or a human voice telling a story. This research aims to compare the listeners’ perception, creation of mental images, narrative engagement, physiological response, and recognition of information when listening to stories conveyed by human and synthetic voices. The results showed that listeners enjoyed stories narrated by a human voice more than a synthetic one. Also, they created more mental images, were more engaged, paid more attention, had a more positive emotional response, and remembered more information. Speech synthesis has experienced considerable progress. However, there are still significant differences versus human voices, so that using them to narrate long stories, such as audiobooks do, is difficult.


Author(s):  
Jennifer M. Vojtech ◽  
Michael D. Chan ◽  
Bhawna Shiwani ◽  
Serge H. Roy ◽  
James T. Heaton ◽  
...  

Purpose This study aimed to evaluate a novel communication system designed to translate surface electromyographic (sEMG) signals from articulatory muscles into speech using a personalized, digital voice. The system was evaluated for word recognition, prosodic classification, and listener perception of synthesized speech. Method sEMG signals were recorded from the face and neck as speakers with ( n  = 4) and without ( n  = 4) laryngectomy subvocally recited (silently mouthed) a speech corpus comprising 750 phrases (150 phrases with variable phrase-level stress). Corpus tokens were then translated into speech via personalized voice synthesis ( n  = 8 synthetic voices) and compared against phrases produced by each speaker when using their typical mode of communication ( n  = 4 natural voices, n  = 4 electrolaryngeal [EL] voices). Naïve listeners ( n  = 12) evaluated synthetic, natural, and EL speech for acceptability and intelligibility in a visual sort-and-rate task, as well as phrasal stress discriminability via a classification mechanism. Results Recorded sEMG signals were processed to translate sEMG muscle activity into lexical content and categorize variations in phrase-level stress, achieving a mean accuracy of 96.3% ( SD  = 3.10%) and 91.2% ( SD  = 4.46%), respectively. Synthetic speech was significantly higher in acceptability and intelligibility than EL speech, also leading to greater phrasal stress classification accuracy, whereas natural speech was rated as the most acceptable and intelligible, with the greatest phrasal stress classification accuracy. Conclusion This proof-of-concept study establishes the feasibility of using subvocal sEMG-based alternative communication not only for lexical recognition but also for prosodic communication in healthy individuals, as well as those living with vocal impairments and residual articulatory function. Supplemental Material https://doi.org/10.23641/asha.14558481


2021 ◽  
Vol 127 ◽  
pp. 43-63
Author(s):  
Iona Gessinger ◽  
Eran Raveh ◽  
Ingmar Steiner ◽  
Bernd Möbius

Author(s):  
Neasa Ní Chiaráin

Tá an córas sintéiseach téacs-go-hurlabhra, ABAIR (www.abair.ie), á fhorbairt sa tSaotharlann Foghraíochta agus Urlabhra i gColáiste na Tríonóide le roinnt blianta anuas agus tá na guthanna sintéiseacha ar fáil anois sna trí mhórchanúint – Canúint na Mumhan (baineann agus fireann), Canúint Connacht (fireann) agus Canúint Uladh (baineann). Tá obair thaighde ar siúl sa tSaotharlann le blianta beaga anuas chun féachaint ar na feidhmeanna ar féidir a bhaint as na guthanna seo. Tá an páipéar seo dírithe ar an úsáid a d'fhéadfaí a bhaint astu i réimse Fhoghlaim Ríomhchuidithe Teangacha-Chliste (FRT-Chliste) agus go háirithe ar an úsáid a d'fhéadfaí a bhaint astu i bhforbairt ardán a cheadódh don fhoghlaimeoir idirghníomhaíocht phearsanta a dhéanamh leis an ríomhaire, rud a chabhródh le foghlaim fhéinriartha na Gaeilge. Léirítear féidearthachtaí na teicneolaíochtaí seo i gcomhthéacs an ardáin phíolótaigh, An Scéalaí, atá á fhorbairt faoi láthair. Text-to-speech synthesis systems are being developed as part of the ABAIR initiative (www.abair.ie), in the Phonetics and Speech Laboratory in Trinity College Dublin. Synthetic voices are now available in the three major dialects - Munster (female and male), Connacht (male) and Ulster (female). This paper gives an overview of the Irish synthetic voices and focuses on their use in the context of Intelligent Computer-Assisted Language Learning (iCALL) and in particular their use in the development of interactive language learning platforms for the self-directed learning of Irish. The potential of this technology is demonstrated in the context of a new iCALL platform, An Scéalaí (‘the Storyteller’), currently under development.


Sign in / Sign up

Export Citation Format

Share Document