scholarly journals LDL-AURIS: Error-driven Learning in Modeling Spoken Word Recognition

2020 ◽  
Author(s):  
Elnaz Shafaei-Bajestan ◽  
Masoumeh Moradipour-Tari ◽  
Peter Uhrig ◽  
R. H. Baayen

A computational model for auditory word recognition is presented that enhances the model of Arnold et al. (2017). Real-valued features are extracted from the speech signal instead of discrete features. One-hot encoding for words’ meanings is replaced by real-valued semantic vectors, adding a small amount of noise to safeguard discriminability. Instead of learning with Rescorla-Wagner updating, we use multivariate multiple regression, which captures discrimination learning at the limit of experience. These new design features substantially improve prediction accuracy for words extracted from spontaneous conversations. They also provide enhanced temporal granularity, enabling the modeling of cohort-like effects. Clustering with t-SNE shows that the acoustic form space captures phone-like similarities and differences. Thus, wide learning with high-dimensional vectors and no hidden layers, and no abstract mediating phone-like representations is not only possible but achieves excellent performance that approximates the lower bound of human accuracy on the challenging task of isolated word recognition.

2010 ◽  
Vol 21 (03) ◽  
pp. 163-168 ◽  
Author(s):  
Edward T. Auer

Background: The visual speech signal can provide sufficient information to support successful communication. However, individual differences in the ability to appreciate that information are large, and relatively little is known about their sources. Purpose: Here a body of research is reviewed regarding the development of a theoretical framework in which to study speechreading and individual differences in that ability. Based on the hypothesis that visual speech is processed via the same perceptual-cognitive machinery as auditory speech, a theoretical framework was developed by adapting a theoretical framework originally developed for auditory spoken word recognition. Conclusion: The evidence to date is consistent with the conclusion that visual spoken word recognition is achieved via a process similar to auditory word recognition provided differences in perceptual similarity are taken into account. Words perceptually similar to many other words and that occur infrequently in the input stream are at a distinct disadvantage within this process. The results to date are also consistent with the conclusion that deaf individuals, regardless of speechreading ability, recognize spoken words via a process similar to individuals with hearing.


2004 ◽  
Vol 16 (4) ◽  
pp. 541-552 ◽  
Author(s):  
Claudia K. Friedrich ◽  
Sonja A. Kotz ◽  
Angela D. Friederici ◽  
Thomas C. Gunter

Behavioral evidence suggests that spoken word recognition involves the temporary activation of multiple entries in a listener's mental lexicon. This phenomenon can be demonstrated in cross-modal word fragment priming (CMWP). In CMWP, an auditory word fragment (prime) is immediately followed by a visual word or pseudoword (target). Experiment 1 investigated ERPs for targets presented in this paradigm. Half of the targets were congruent with the prime (e.g., in the prime-target pair: AM-AMBOSS [anvil]), half were not (e.g., AM-PENSUM [pensum]). Lexical entries of the congruent targets should receive activation from the prime. Thus, lexical identification of these targets should be facilitated. An ERP effect named P350, two frontal negative ERP deflections, and the N400 were sensitive to prime-target congruency. In Experiment 2, the relation of the formerly observed ERP effects to processes in a modality-independent mental lexicon was investigated by presenting primes visually. Only the P350 effect could be replicated across different fragment lengths. Therefore, the P350 is discussed as a correlate of lexical identification in a modality-independent mental lexicon.


2014 ◽  
Vol 31 (1) ◽  
pp. 29-52 ◽  
Author(s):  
Satsuki Nakai ◽  
Shane Lindsay ◽  
Mitsuhiko Ota

When both members of a phonemic contrast in L2 (second language) are perceptually mapped to a single phoneme in one’s L1 (first language), L2 words containing a member of that contrast can spuriously activate L2 words in spoken-word recognition. For example, upon hearing cattle, Dutch speakers of English are reported to experience activation of kettle, as L1 Dutch speakers perceptually map the vowel in the two English words to a single vowel phoneme in their L1. In an auditory word-learning experiment using Greek and Japanese speakers of English, we asked whether such cross-lexical activation in L2 spoken-word recognition necessarily involves inaccurate perception by the L2 listeners, or can also arise from interference from L1 phonology at an abstract level, independent of the listeners’ phonetic processing abilities. Results suggest that spurious activation of L2 words containing L2-specific contrasts in spoken-word recognition is contingent on the L2 listeners’ inadequate phonetic processing abilities.


1997 ◽  
Author(s):  
Paul D. Allopenna ◽  
James S. Magnuson ◽  
Michael K. Tanenhaus

Sign in / Sign up

Export Citation Format

Share Document