human speech
Recently Published Documents


TOTAL DOCUMENTS

599
(FIVE YEARS 149)

H-INDEX

41
(FIVE YEARS 4)

2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Andrey Anikin ◽  
Katarzyna Pisanski ◽  
David Reby

When producing intimidating aggressive vocalizations, humans and other animals often extend their vocal tracts to lower their voice resonance frequencies (formants) and thus sound big. Is acoustic size exaggeration more effective when the vocal tract is extended before, or during, the vocalization, and how do listeners interpret within-call changes in apparent vocal tract length? We compared perceptual effects of static and dynamic formant scaling in aggressive human speech and nonverbal vocalizations. Acoustic manipulations corresponded to elongating or shortening the vocal tract either around (Experiment 1) or from (Experiment 2) its resting position. Gradual formant scaling that preserved average frequencies conveyed the impression of smaller size and greater aggression, regardless of the direction of change. Vocal tract shortening from the original length conveyed smaller size and less aggression, whereas vocal tract elongation conveyed larger size and more aggression, and these effects were stronger for static than for dynamic scaling. Listeners familiarized with the speaker's natural voice were less often ‘fooled’ by formant manipulations when judging speaker size, but paid more attention to formants when judging aggressive intent. Thus, within-call vocal tract scaling conveys emotion, but a better way to sound large and intimidating is to keep the vocal tract consistently extended.


2021 ◽  
Vol 3 (2) ◽  
pp. 1-2
Author(s):  
Ingo R. Titze

In its broadest definition, Vocology is the study of vocalization, much like audiology is the study of hearing. Vocology includes the exploration of the full capability of human and animal sound production, some of which is embedded in human speech. For professional practice, a secondary definition of Vocology is the science and practice of voice habilitation, concept that has been in existence for more than two decades. The emphasis is on habilitation rather than re-habilitation, so that the field does not infringe on speech-language pathology. Besides, it does include the important area of animal vocalization.


2021 ◽  
Author(s):  
Masooda Modak ◽  
L M Shruti ◽  
Manoj Selvan

Author(s):  
Evgenij F. Tarasov

The article questions if human speech communication (SC) involves a transfer of information. The information functioning in speech communication is dwelled upon in the information and systemic activity approaches. The informational approach adequately explains only the direct method of information transfer, while the systemic activity approach is relevant for the sign-mediated speech communication typical for human interaction. The more heuristic thesis is that the perception of the chain of linguistic sign bodies produced in the intersubjective space only starts the construction of the perceived speech message content by the recipient. The completeness of the constructed speech message content depends entirely on the recipient, who has the optimal common consciousness with the speaker. The purpose of speech messages is not the actual construction of the content by the recipient, but the development of the message personal meaning. In human speech communication, the communicants do not transmit information, but use verbal signs bodies to actualize images of consciousness which are developed within a single ethnic culture and therefore are common for them. The incentive for the common consciousness development by the communicants is their participation in joint activities that ensure their earthly existence.


2021 ◽  
Author(s):  
Carolin Juechter ◽  
Rainer Beutelmann ◽  
Georg M. Klump

The present study establishes the Mongolian gerbil (Meriones unguiculatus) as a model for investigating the perception of human speech sounds. We report data on the discrimination of logatomes (CVCs - consonant-vowel-consonant combinations with outer consonants /b/, /d/, /s/ and /t/ and central vowels /a/, /aː/, /ɛ/, /eː/, /ɪ/, /iː/, /ɔ/, /oː/, /ʊ/ and /uː/, VCVs - vowel-consonant-vowel combinations with outer vowels /a/, /ɪ/ and /ʊ/ and central consonants /b/, /d/, /f/, /g/, /k/, /l/, /m/, /n/, /p/, /s/, /t/ and /v/) by young gerbils. Four young gerbils were trained to perform an oddball target detection paradigm in which they were required to discriminate a deviant CVC or VCV in a sequence of CVC or VCV standards, respectively. The experiments were performed with an ICRA-1 noise masker with speech-like spectral properties, and logatomes of multiple speakers were presented at various signal-to-noise ratios. Response latencies were measured to generate perceptual maps employing multidimensional scaling, which visualize the gerbils' internal representations of the sounds. The dimensions of the perceptual maps were correlated to multiple phonetic features of the speech sounds for evaluating which features of vowels and consonants are most important for the discrimination. The perceptual representation of vowels and consonants in gerbils was similar to that of humans, although gerbils needed higher signal-to-noise ratios for the discrimination of speech sounds than humans. The gerbils' discrimination of vowels depended on differences in the frequencies of the first and second formant determined by tongue height and position. Consonants were discriminated based on differences in combinations of their articulatory features. The similarities in the perception of logatomes by gerbils and humans renders the gerbil a suitable model for human speech sound discrimination.


2021 ◽  
Author(s):  
◽  
Steven Van Kuyk

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so.  The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility?  To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals.  To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units.   Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>


2021 ◽  
Author(s):  
◽  
Steven Van Kuyk

<p>Throughout the last century, models of human speech communication have been proposed by linguists, psychologists, and engineers. Advancements have been made, but a theory of human speech communication that is both comprehensive and quantitative is yet to emerge. This thesis hypothesises that a branch of mathematics known as information theory holds the answer to a more complete theory. Information theory has made fundamental contributions to wireless communications, computer science, statistical inference, cryptography, thermodynamics, and biology. There is no reason that information theory cannot be applied to human speech communication, but thus far, a relatively small effort has been made to do so.  The goal of this research was to develop a quantitative model of speech communication that is consistent with our knowledge of linguistics and that is accurate enough to predict the intelligibility of speech signals. Specifically, this thesis focuses on the following research questions: 1) how does the acoustic information rate of speech compare to the lexical information rate of speech? 2) How can information theory be used to predict the intelligibility of speech-based communication systems? 3) How well do competing models of speech communication predict intelligibility?  To answer the first research question, novel approaches for estimating the information rate of speech communication are proposed. Unlike existing approaches, the methods proposed in this thesis rely on having a chorus of speech signals where each signal in the chorus contains the same linguistic message, but is spoken by a different talker. The advantage of this approach is that variability inherent in the production of speech can be accounted for. The approach gives an estimate of about 180 b/s. This is three times larger than estimates based on lexical models, but it is an order of magnitude smaller than previous estimates that rely on acoustic signals.  To answer the second research question, a novel instrumental intelligibility metric called speech intelligibility in bits (SIIB) and a variant called SIIBGauss are proposed. SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing intelligibility metrics that are based on information theory, SIIB accounts for talker variability and statistical dependencies between time-frequency units.   Finally, to answer the third research question, a comprehensive evaluation of intrusive intelligibility metrics is provided. The results show that SIIB and SIIBGauss have state-of-the-art performance, that intelligibility metrics tend to perform poorly on data sets that were not used during their development, and show the advantage of reducing statistical dependencies between input features.</p>


2021 ◽  
Vol 25 (2) ◽  
pp. 257-269
Author(s):  
Ádám Fodor ◽  
László Kopácsi ◽  
Zoltán Ádám Milacski ◽  
András Lőrincz

Cloud-based speech services are powerful practical tools but the privacy of the speakers raises important legal concerns when exposed to the Internet. We propose a deep neural network solution that removes personal characteristics from human speech by converting it to the voice of a Text-to-Speech (TTS) system before sending the utterance to the cloud. The network learns to transcode sequences of vocoder parameters, delta and delta-delta features of human speech to those of the TTS engine. We evaluated several TTS systems, vocoders and audio alignment techniques. We measured the performance of our method by (i) comparing the result of speech recognition on the de-identified utterances with the original texts, (ii) computing the Mel-Cepstral Distortion of the aligned TTS and the transcoded sequences, and (iii) questioning human participants in A-not-B, 2AFC and 6AFC tasks. Our approach achieves the level required by diverse applications.


2021 ◽  
Author(s):  
◽  
Snehal Poojary

<p>Numerous studies over the past decade have investigated to making human animation as realistic as possible, especially facial animation. Let’s consider facial animation for human speech. Animating a face, to match up to a speech, requires a lot of effort. Most of the process has now been automated to make it easier for the artist to create facial animation along with lip sync based on a speech provided by the user. While these systems concentrate on the mouth and tongue, where articulation of speech takes place, very little effort has gone to understand and to recreate the exact motion of the neck during speech. The neck plays an important role in voice production and hence it is essential to study the motion created by it.  The purpose of this research is to study the motion of the neck during speech. This research makes two contributions. First, predicting the motion of the neck around the strap muscles for a given speech. This is achieved by training a program with position data of marker placed on the neck along with its speech analysis data. Second, understanding the basic neck motion during speech. This will help an artist understand how the neck should be animated during speech.</p>


Sign in / Sign up

Export Citation Format

Share Document