scholarly journals Prediction and error in early infant speech learning: a speech acquisition model

2021 ◽  
Author(s):  
Jessie S. Nixon ◽  
Fabian Tomaschek

In the last two decades, statistical clustering models have emerged as a dominant model of how infants learn the sounds of their language. However, recent empirical and computational evidence suggests that purely statistical clustering methods may not be sufficient to explain speech sound acquisition. To model early development of speech perception, the present study used a two-layer network trained with Rescorla-Wagner learning equations, an implementation of discriminative, error-driven learning. The model contained no a priori linguistic units, such as phonemes or phonetic features. Instead, expectations about the upcoming acoustic speech signal were learned from the surrounding speech signal, with spectral components extracted from an audio recording of child-directed speech as both inputs and outputs of the model. To evaluate model performance, we simulated infant responses in the high-amplitude sucking paradigm using vowel and fricative pairs and continua. The simulations were able to discriminate vowel and consonant pairs and predicted the infant speech perception data. The model also showed the greatest amount of discrimination in the expected spectral frequencies. These results suggest that discriminative error-driven learning may provide a viable approach to mod- elling early infant speech sound acquisition.

1975 ◽  
Vol 18 (1) ◽  
pp. 158-167 ◽  
Author(s):  
Rebecca E. Eilers ◽  
Fred D. Minifie

In three separate experiments using controlled natural stimuli and a high-amplitude sucking paradigm, infants' ability to detect differences between /s/ and /v/, /s/ and /∫/, and /s/ and /z/, respectively, was investigated. Evidence for discrimination was obtained for /s/ versus /v/ and /s/ versus /∫/ but not for /s/ versus /z/. Implications for a theory of infant speech perception are discussed.


1977 ◽  
Vol 20 (4) ◽  
pp. 766-780 ◽  
Author(s):  
Rebecca E. Eilers ◽  
Wesley R. Wilson ◽  
John M. Moore

A visually reinforced infant speech discrimination (VRISD) paradigm is described and evaluated. Infants at two ages were tested with the new paradigm on the following speech contrasts: [sa] vs [va], [sa] vs [∫a], [sa] vs [za], [as] vs [a:z], [a:s], vs [a:z], [at] vs [a:d], [a:t] vs [a:d], [at] vs [a:t], [fa] vs [θa], and [fi] vs [θi]. The data reported are compared with data on the same speech contrasts obtained from three month olds in a high-amplitude sucking paradigm. Evidence suggesting developmental changes in speech-sound discriminatory ability is reported. Results are interpreted in light of salience of available acoustic cues and in terms of new methodological advances.


2020 ◽  
pp. 65-72
Author(s):  
V. V. Savchenko ◽  
A. V. Savchenko

This paper is devoted to the presence of distortions in a speech signal transmitted over a communication channel to a biometric system during voice-based remote identification. We propose to preliminary correct the frequency spectrum of the received signal based on the pre-distortion principle. Taking into account a priori uncertainty, a new information indicator of speech signal distortions and a method for measuring it in conditions of small samples of observations are proposed. An example of fast practical implementation of the method based on a parametric spectral analysis algorithm is considered. Experimental results of our approach are provided for three different versions of communication channel. It is shown that the usage of the proposed method makes it possible to transform the initially distorted speech signal into compliance on the registered voice template by using acceptable information discrimination criterion. It is demonstrated that our approach may be used in existing biometric systems and technologies of speaker identification.


Author(s):  
Michael Withnall ◽  
Edvard Lindelöf ◽  
Ola Engkvist ◽  
Hongming Chen

We introduce Attention and Edge Memory schemes to the existing Message Passing Neural Network framework for graph convolution, and benchmark our approaches against eight different physical-chemical and bioactivity datasets from the literature. We remove the need to introduce <i>a priori</i> knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.


1998 ◽  
Vol 21 (2) ◽  
pp. 241-259 ◽  
Author(s):  
Harvey M. Sussman ◽  
David Fruchter ◽  
Jon Hilbert ◽  
Joseph Sirosh

Neuroethological investigations of mammalian and avian auditory systems have documented species-specific specializations for processing complex acoustic signals that could, if viewed in abstract terms, have an intriguing and striking relevance for human speech sound categorization and representation. Each species forms biologically relevant categories based on combinatorial analysis of information-bearing parameters within the complex input signal. This target article uses known neural models from the mustached bat and barn owl to develop, by analogy, a conceptualization of human processing of consonant plus vowel sequences that offers a partial solution to the noninvariance dilemma – the nontransparent relationship between the acoustic waveform and the phonetic segment. Critical input sound parameters used to establish species-specific categories in the mustached bat and barn owl exhibit high correlation and linearity due to physical laws. A cue long known to be relevant to the perception of stop place of articulation is the second formant (F2) transition. This article describes an empirical phenomenon – the locus equations – that describes the relationship between the F2 of a vowel and the F2 measured at the onset of a consonant-vowel (CV) transition. These variables, F2 onset and F2 vowel within a given place category, are consistently and robustly linearly correlated across diverse speakers and languages, and even under perturbation conditions as imposed by bite blocks. A functional role for this category-level extreme correlation and linearity (the “orderly output constraint”) is hypothesized based on the notion of an evolutionarily conserved auditory-processing strategy. High correlation and linearity between critical parameters in the speech signal that help to cue place of articulation categories might have evolved to satisfy a preadaptation by mammalian auditory systems for representing tightly correlated, linearly related components of acoustic signals.


2007 ◽  
Author(s):  
H. Timothy Bunnell ◽  
N. Carolyn Schanen ◽  
Linda D. Vallino ◽  
Thierry G. Morlet ◽  
James B. Polikoff ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document