LDL-AURIS: Error-driven Learning in Modeling Spoken Word Recognition

A computational model for auditory word recognition is presented that enhances the model of Arnold et al. (2017). Real-valued features are extracted from the speech signal instead of discrete features. One-hot encoding for words’ meanings is replaced by real-valued semantic vectors, adding a small amount of noise to safeguard discriminability. Instead of learning with Rescorla-Wagner updating, we use multivariate multiple regression, which captures discrimination learning at the limit of experience. These new design features substantially improve prediction accuracy for words extracted from spontaneous conversations. They also provide enhanced temporal granularity, enabling the modeling of cohort-like effects. Clustering with t-SNE shows that the acoustic form space captures phone-like similarities and differences. Thus, wide learning with high-dimensional vectors and no hidden layers, and no abstract mediating phone-like representations is not only possible but achieves excellent performance that approximates the lower bound of human accuracy on the challenging task of isolated word recognition.

Download Full-text

Investigating Speechreading and Deafness

Journal of the American Academy of Audiology ◽

10.3766/jaaa.21.3.4 ◽

2010 ◽

Vol 21 (03) ◽

pp. 163-168 ◽

Cited By ~ 4

Author(s):

Edward T. Auer

Keyword(s):

Individual Differences ◽

Word Recognition ◽

Spoken Word Recognition ◽

Theoretical Framework ◽

Spoken Word ◽

Visual Speech ◽

Sufficient Information ◽

Successful Communication ◽

Auditory Word ◽

Auditory Word Recognition

Background: The visual speech signal can provide sufficient information to support successful communication. However, individual differences in the ability to appreciate that information are large, and relatively little is known about their sources. Purpose: Here a body of research is reviewed regarding the development of a theoretical framework in which to study speechreading and individual differences in that ability. Based on the hypothesis that visual speech is processed via the same perceptual-cognitive machinery as auditory speech, a theoretical framework was developed by adapting a theoretical framework originally developed for auditory spoken word recognition. Conclusion: The evidence to date is consistent with the conclusion that visual spoken word recognition is achieved via a process similar to auditory word recognition provided differences in perceptual similarity are taken into account. Words perceptually similar to many other words and that occur infrequently in the input stream are at a distinct disadvantage within this process. The results to date are also consistent with the conclusion that deaf individuals, regardless of speechreading ability, recognize spoken words via a process similar to individuals with hearing.

Download Full-text

ERPs Reflect Lexical Identification in Word Fragment Priming

Journal of Cognitive Neuroscience ◽

10.1162/089892904323057281 ◽

2004 ◽

Vol 16 (4) ◽

pp. 541-552 ◽

Cited By ~ 33

Author(s):

Claudia K. Friedrich ◽

Sonja A. Kotz ◽

Angela D. Friederici ◽

Thomas C. Gunter

Keyword(s):

Word Recognition ◽

Spoken Word Recognition ◽

Mental Lexicon ◽

Spoken Word ◽

Word Fragment ◽

Target Pair ◽

Auditory Word ◽

Target Experiment ◽

Behavioral Evidence ◽

Prime Target

Behavioral evidence suggests that spoken word recognition involves the temporary activation of multiple entries in a listener's mental lexicon. This phenomenon can be demonstrated in cross-modal word fragment priming (CMWP). In CMWP, an auditory word fragment (prime) is immediately followed by a visual word or pseudoword (target). Experiment 1 investigated ERPs for targets presented in this paradigm. Half of the targets were congruent with the prime (e.g., in the prime-target pair: AM-AMBOSS [anvil]), half were not (e.g., AM-PENSUM [pensum]). Lexical entries of the congruent targets should receive activation from the prime. Thus, lexical identification of these targets should be facilitated. An ERP effect named P350, two frontal negative ERP deflections, and the N400 were sensitive to prime-target congruency. In Experiment 2, the relation of the formerly observed ERP effects to processes in a modality-independent mental lexicon was investigated by presenting primes visually. Only the P350 effect could be replicated across different fragment lengths. Therefore, the P350 is discussed as a correlate of lexical identification in a modality-independent mental lexicon.

Download Full-text

Neural systems underlying lexical competition in auditory word recognition and spoken word production: Evidence from aphasia and functional neuroimaging

Lexical Representation ◽

10.1515/9783110224931.123 ◽

2011 ◽

Cited By ~ 1

Author(s):

Sheila E. Blumstein

Keyword(s):

Word Recognition ◽

Functional Neuroimaging ◽

Spoken Word ◽

Word Production ◽

Neural Systems ◽

Lexical Competition ◽

Spoken Word Production ◽

Auditory Word ◽

Auditory Word Recognition

Download Full-text

A prerequisite to L1 homophone effects in L2 spoken-word recognition

Second language Research ◽

10.1177/0267658314534661 ◽

2014 ◽

Vol 31 (1) ◽

pp. 29-52 ◽

Cited By ~ 4

Author(s):

Satsuki Nakai ◽

Shane Lindsay ◽

Mitsuhiko Ota

Keyword(s):

Second Language ◽

Word Recognition ◽

Spoken Word Recognition ◽

First Language ◽

Spoken Word ◽

Lexical Activation ◽

Auditory Word ◽

Processing Abilities ◽

Abstract Level ◽

Vowel Phoneme

When both members of a phonemic contrast in L2 (second language) are perceptually mapped to a single phoneme in one’s L1 (first language), L2 words containing a member of that contrast can spuriously activate L2 words in spoken-word recognition. For example, upon hearing cattle, Dutch speakers of English are reported to experience activation of kettle, as L1 Dutch speakers perceptually map the vowel in the two English words to a single vowel phoneme in their L1. In an auditory word-learning experiment using Greek and Japanese speakers of English, we asked whether such cross-lexical activation in L2 spoken-word recognition necessarily involves inaccurate perception by the L2 listeners, or can also arise from interference from L1 phonology at an abstract level, independent of the listeners’ phonetic processing abilities. Results suggest that spurious activation of L2 words containing L2-specific contrasts in spoken-word recognition is contingent on the L2 listeners’ inadequate phonetic processing abilities.

Download Full-text