speaker normalization
Recently Published Documents


TOTAL DOCUMENTS

125
(FIVE YEARS 5)

H-INDEX

14
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Shashi Kumar ◽  
Shakti P. Rath ◽  
Abhishek Pandey


Author(s):  
Stephen Grossberg

This far-ranging chapter provides unified explanations of data about audition, speech, and language, and the general cognitive processes that they specialize. The ventral What stream and dorsal Where cortical stream in vision have analogous ventral sound-to-meaning and dorsal sound-to-action streams in audition. Circular reactions for learning to reach using vision are homologous to circular reactions for learning to speak using audition. VITE circuits control arm movement properties of synergy, synchrony, and speed. Volitional basal ganglia GO signals choose which limb to move and how fast it moves. VAM models use a circular reaction to calibrate VITE circuit signals. VITE is joined with the FLETE model to compensate for variable loads, unexpected perturbations, and obstacles. Properties of cells in cortical areas 4 and 5, spinal cord, and cerebellum are quantitatively simulated. Motor equivalent reaching using clamped joints or tools arises from circular reactions that learn representations of space around an actor. Homologous circuits model motor-equivalent speech production, including coarticulation. Stream-shroud resonances play the role for audition that surface-shroud resonances play in vision. They support auditory consciousness and speech production. Strip maps and spectral-pitch resonances cooperate to solve the cocktail party problem whereby humans track voices of speakers in noisy environments with multiple sources. Auditory streaming and speaker normalization use networks with similar designs. Item-Order-Rank working memories and Masking Field networks temporarily store sequences of events while categorizing them into list chunks. Analog numerical representations and place-value number systems emerge from phylogenetically earlier Where and What stream spatial and categorical processes.



Author(s):  
Elsadig Ali Elsadig Elandeef ◽  
Ayman Hamad Elneil Hamdan

This study aims to accentuate spoken production and speech reception regarding sentence formation. The study demonstrates the spoken production models such as Fromkin's Five Stage Model, The Bock and Levelt Model, Fromkin's Five Stage Model, Parallel –Processing Models and The Dell Model. It also states communicative problems strategies and many types of errors and mistakes relatively common in normal speech production, such as spoonerisms and speech errors. The study entails speech perception and how spoken language is perceived through linearity, segmentation, speaker normalization, and the basic unit of speech perception.



2021 ◽  
pp. 145-176
Author(s):  
Keith Johnson ◽  
Matthias J. Sjerps


Author(s):  
Tadashi Sakata ◽  
Naomitsu Ikeda ◽  
Yuichi Ueda ◽  
Akira Watanabe


2020 ◽  
Author(s):  
Fenglin Ding ◽  
Wu Guo ◽  
Bin Gu ◽  
Zhen-Hua Ling ◽  
Jun Du


2020 ◽  
pp. 002383092092937
Author(s):  
Wil Rankinen ◽  
Kenneth de Jong

This paper explores the relationship between speaker normalization and dialectal identity in sociolinguistic data, examining a database of vowel formants collected from 88 monolingual American English speakers in Michigan’s Upper Peninsula. Audio recordings of Finnish- and Italian-heritage American English speakers reading a passage and a word list were normalized using two normalization procedures. These algorithms are based on different concepts of normalization: Lobanov, which models normalization as based on experience with individual talkers, and Labov ANAE, which models normalization as based on experience with scale-factors inherent in acoustic resonators of all kinds. The two procedures yielded different results; while the Labov ANAE method reveals a cluster shifting of low and back vowels that correlated with heritage, the Lobanov procedure seems to eliminate this sociolinguistic variation. The difference between the two procedures lies in how they treat relations between formant changes, suggesting that dimensions of variation in the vowel space may be treated differently by different normalization procedures, raising the question of how anatomical variation and dialectal variation interact in the real world. The structure of the sociolinguistic effects found with the Labov ANAE normalized data, but not in the Lobanov normalized data, suggest that the Lobanov normalization does over-normalize formant measures and remove sociolinguistically relevant information.





2019 ◽  
Vol 15 (12) ◽  
pp. 20190555 ◽  
Author(s):  
Holly Root-Gutteridge ◽  
Victoria F. Ratcliffe ◽  
Anna T. Korzeniowska ◽  
David Reby

Domesticated animals have been shown to recognize basic phonemic information from human speech sounds and to recognize familiar speakers from their voices. However, whether animals can spontaneously identify words across unfamiliar speakers (speaker normalization) or spontaneously discriminate between unfamiliar speakers across words remains to be investigated. Here, we assessed these abilities in domestic dogs using the habituation–dishabituation paradigm. We found that while dogs habituated to the presentation of a series of different short words from the same unfamiliar speaker, they significantly dishabituated to the presentation of a novel word from a new speaker of the same gender. This suggests that dogs spontaneously categorized the initial speaker across different words. Conversely, dogs who habituated to the same short word produced by different speakers of the same gender significantly dishabituated to a novel word, suggesting that they had spontaneously categorized the word across different speakers. Our results indicate that the ability to spontaneously recognize both the same phonemes across different speakers, and cues to identity across speech utterances from unfamiliar speakers, is present in domestic dogs and thus not a uniquely human trait.



Sign in / Sign up

Export Citation Format

Share Document