synthetic speech Latest Research Papers

Distinct higher-order representations of natural sounds in human and ferret auditory cortex

eLife ◽

10.7554/elife.65566 ◽

2021 ◽

Vol 10 ◽

Author(s):

Agnès Landemard ◽

Célian Bimbard ◽

Charlie Demené ◽

Shihab Shamma ◽

Sam Norman-Haignere ◽

...

Keyword(s):

Auditory Cortex ◽

Primary Auditory Cortex ◽

Higher Order ◽

Synthetic Speech ◽

Natural Sounds ◽

Unique Role ◽

Order Processing ◽

Neural Representations ◽

Human Hearing ◽

Late Stages

Little is known about how neural representations of natural sounds differ across species. For example, speech and music play a unique role in human hearing, yet it is unclear how auditory representations of speech and music differ between humans and other animals. Using functional ultrasound imaging, we measured responses in ferrets to a set of natural and spectrotemporally matched synthetic sounds previously tested in humans. Ferrets showed similar lower-level frequency and modulation tuning to that observed in humans. But while humans showed substantially larger responses to natural vs. synthetic speech and music in non-primary regions, ferret responses to natural and synthetic sounds were closely matched throughout primary and non-primary auditory cortex, even when tested with ferret vocalizations. This finding reveals that auditory representations in humans and ferrets diverge sharply at late stages of cortical processing, potentially driven by higher-order processing demands in speech and music.

Sequence-to-Sequence Acoustic Modeling with Semi-Stepwise Monotonic Attention for Speech Synthesis

Applied Sciences ◽

10.3390/app112110475 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10475

Author(s):

Xiao Zhou ◽

Zhenhua Ling ◽

Yajun Hu ◽

Lirong Dai

Keyword(s):

Speech Synthesis ◽

Attention Mechanism ◽

Experimental Results ◽

Acoustic Modeling ◽

Synthetic Speech ◽

Acoustic Feature ◽

Popular Method ◽

Hidden States ◽

G2p Conversion

An encoder–decoder with attention has become a popular method to achieve sequence-to-sequence (Seq2Seq) acoustic modeling for speech synthesis. To improve the robustness of the attention mechanism, methods utilizing the monotonic alignment between phone sequences and acoustic feature sequences have been proposed, such as stepwise monotonic attention (SMA). However, the phone sequences derived by grapheme-to-phoneme (G2P) conversion may not contain the pauses at the phrase boundaries in utterances, which challenges the assumption of strictly stepwise alignment in SMA. Therefore, this paper proposes to insert hidden states into phone sequences to deal with the situation that pauses are not provided explicitly, and designs a semi-stepwise monotonic attention (SSMA) to model these inserted hidden states. In this method, hidden states are introduced that absorb the pause segments in utterances in an unsupervised way. Thus, the attention at each decoding frame has three options, moving forward to the next phone, staying at the same phone, or jumping to a hidden state. Experimental results show that SSMA can achieve better naturalness of synthetic speech than SMA when phrase boundaries are not available. Moreover, the pause positions derived from the alignment paths of SSMA matched the manually labeled phrase boundaries quite well.

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

10.21437/asvspoof.2021-12 ◽

2021 ◽

Author(s):

Xinhui Chen ◽

You Zhang ◽

Ge Zhu ◽

Zhiyao Duan

Keyword(s):

Detection System ◽

Synthetic Speech ◽

Speech Detection

Long-term variable Q transform: A novel time-frequency transform algorithm for synthetic speech detection

Digital Signal Processing ◽

10.1016/j.dsp.2021.103256 ◽

2021 ◽

pp. 103256

Author(s):

Jialong Li ◽

Hongxia Wang ◽

Peisong He ◽

Sani M. Abdullahi ◽

Bin Li

Keyword(s):

Synthetic Speech ◽

Speech Detection ◽

Time Frequency

Public Perceptions Towards Synthetic Voice Technology

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181321651128 ◽

2021 ◽

Vol 65 (1) ◽

pp. 1448-1452

Author(s):

Ben Noah ◽

Arathi Sethumadhavan ◽

Josh Lovejoy ◽

David Mondello

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Public Perceptions ◽

Synthetic Speech ◽

Artificial Agents ◽

Text To Speech ◽

Voice Technology ◽

Small Collection ◽

Barrier To Entry ◽

Synthetic Voices

Text-to-Speech (TTS) technologies have provided ways to produce acoustic approximations of human voices. However, recent advancements in machine learning (i.e., neural network TTS) have helped move beyond coarse mimicry and towards more natural-sounding speech. With only a small collection of recorded utterances, it is now possible to generate wholly synthetic voices indistinguishable from those of human speakers. While these new approaches to speech synthesis can help facilitate more seamless experiences with artificial agents, they also lower the barrier to entry for those seeking to perpetrate deception. As such, in the development of these technologies, it is important to anticipate potential harms and devise strategies to help mitigate against misuse. This paper presents findings from a 360-person survey that assessed public perceptions of synthetic voices, with a particular focus on how voice type and social scenarios impact ratings of trust. Findings have implications for the responsible deployment of synthetic speech technologies.

Perception of Social Speaker Characteristics in Synthetic Speech

10.21437/interspeech.2021-1229 ◽

2021 ◽

Author(s):

Sai Sirisha Rallabandi ◽

Abhinav Bharadwaj ◽

Babak Naderi ◽

Sebastian Möller

Keyword(s):

Synthetic Speech

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection

10.21437/interspeech.2021-702 ◽

2021 ◽

Author(s):

Xin Wang ◽

Junichi Yamagishi

Keyword(s):

Comparative Study ◽

Synthetic Speech ◽

Speech Detection

Perception of Standard Arabic Synthetic Speech Rate

10.21437/interspeech.2021-39 ◽

2021 ◽

Author(s):

Yahya Aldholmi ◽

Rawan Aldhafyan ◽

Asma Alqahtani

Keyword(s):

Speech Rate ◽

Synthetic Speech ◽

Standard Arabic

Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

10.21437/interspeech.2021-2125 ◽

2021 ◽

Author(s):

Xu Li ◽

Xixin Wu ◽

Hui Lu ◽

Xunying Liu ◽

Helen Meng

Keyword(s):

Synthetic Speech ◽

Robust Detection

Take a Breath: Respiratory Sounds Improve Recollection in Synthetic Speech

10.21437/interspeech.2021-1496 ◽

2021 ◽

Author(s):

Mikey Elmers ◽

Raphael Werner ◽

Beeke Muhlack ◽

Bernd Möbius ◽

Jürgen Trouvain

Keyword(s):

Synthetic Speech ◽

Respiratory Sounds

synthetic speech
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Distinct higher-order representations of natural sounds in human and ferret auditory cortex

Sequence-to-Sequence Acoustic Modeling with Semi-Stepwise Monotonic Attention for Speech Synthesis

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

Long-term variable Q transform: A novel time-frequency transform algorithm for synthetic speech detection

Public Perceptions Towards Synthetic Voice Technology

Perception of Social Speaker Characteristics in Synthetic Speech

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection

Perception of Standard Arabic Synthetic Speech Rate

Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

Take a Breath: Respiratory Sounds Improve Recollection in Synthetic Speech

Export Citation Format

synthetic speechRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Distinct higher-order representations of natural sounds in human and ferret auditory cortex

Sequence-to-Sequence Acoustic Modeling with Semi-Stepwise Monotonic Attention for Speech Synthesis

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

Long-term variable Q transform: A novel time-frequency transform algorithm for synthetic speech detection

Public Perceptions Towards Synthetic Voice Technology

Perception of Social Speaker Characteristics in Synthetic Speech

A Comparative Study on Recent Neural Spoofing Countermeasures for Synthetic Speech Detection

Perception of Standard Arabic Synthetic Speech Rate

Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

Take a Breath: Respiratory Sounds Improve Recollection in Synthetic Speech

synthetic speech
Recently Published Documents