Interaction of Audition and Vision in the Recognition of Oral Speech Stimuli

1969 ◽  
Vol 12 (2) ◽  
pp. 423-425 ◽  
Author(s):  
Norman P. Erber

Audio-visual observation of spoken spondaic words was found to be superior to recognition via audition-only under a wide range of S/N conditions. Data from five subjects supported the notion that observers rely increasingly more on visual cues for speech information as S/N ratio is degraded. Audition-only performance was found to be less variable among subjects than was audio-visual recognition. Increased variability in audio-visual scores at poorer S/N ratios was attributed to differences in lip-reading skill among untrained subjects. Speech levels so low that recognition by audition-only approximated chance behavior were found, nevertheless, to systematically improve observers' audio-visual scores as a function of increasing S/N ratio.

1974 ◽  
Vol 17 (2) ◽  
pp. 270-278 ◽  
Author(s):  
Brian E. Walden ◽  
Robert A. Prosek ◽  
Don W. Worthington

The redundancy between the auditory and visual recognition of consonants was studied in 100 hearing-impaired subjects who demonstrated a wide range of speech-discrimination abilities. Twenty English consonants, recorded in CV combination with the vowel /a/, were presented to the subjects for auditory, visual, and audiovisual identification. There was relatively little variation among subjects in the visual recognition of consonants. A measure of the expected degree of redundancy between an observer’s auditory and visual confusions among consonants was used in an effort to predict audiovisual consonant recognition ability. This redundancy measure was based on an information analysis of an observer’s auditory confusions among consonants and expressed the degree to which his auditory confusions fell within categories of visually homophenous consonants. The measure was found to have moderate predictive value in estimating an observer’s audiovisual consonant recognition score. These results suggest that the degree of redundancy between an observer’s auditory and visual confusions of speech elements is a determinant in the benefit that visual cues offer to that observer.


2020 ◽  
Vol 81 (3) ◽  
pp. 46-51
Author(s):  
I. V. Prishchepova

The article discusses mechanisms of various kinds of disorthography (conditioned by the underdevelopment of morphological, phonemic and graphical bases of orthographic activity) in schoolchildren with general speech underdevelopment. It offers basic methodology and techniques to correct disorthography conditioned by inadequate child acquisition of phonemic, traditional principles of orthography and principles of graphics. A systematic work on the development of psychological and language components of this type of learning activity coupled with oral speech disorder overcoming facilitates successful acquisition of program requirements by such children. The following methods were used: practical (exercises, modelling, construction, schematization, games), visual (observation, image study, image and practical activity results demonstration, stimulus material demonstration), verbal methods (conversation, narration, method of language analysis and synthesis, grammar and orthographic tasks solution). The article covers the results of many years of positive experience in the correction of various types of disorthography of primary schoolchildren with general speech underdevelopment. Formation of grammatical and orthographic activity, the basics of speech and language competences, Russian language academic performance increase are prerequisites of linguistic personality development and child self-development. The practical importance of the research lies in the development and testing of methods of disorthography correction of children with general speech underdevelopment. The given methodology helps to improve their spelling skills and allows to carry out a purposeful and controlled formation of spelling activity. The results can be used in the work of speech therapy centers and educational establishments which carry out inclusive students’ education.


1997 ◽  
Vol 40 (2) ◽  
pp. 432-443 ◽  
Author(s):  
Karen S. Helfer

Research has shown that speaking in a deliberately clear manner can improve the accuracy of auditory speech recognition. Allowing listeners access to visual speech cues also enhances speech understanding. Whether the nature of information provided by speaking clearly and by using visual speech cues is redundant has not been determined. This study examined how speaking mode (clear vs. conversational) and presentation mode (auditory vs. auditory-visual) influenced the perception of words within nonsense sentences. In Experiment 1, 30 young listeners with normal hearing responded to videotaped stimuli presented audiovisually in the presence of background noise at one of three signal-to-noise ratios. In Experiment 2, 9 participants returned for an additional assessment using auditory-only presentation. Results of these experiments showed significant effects of speaking mode (clear speech was easier to understand than was conversational speech) and presentation mode (auditoryvisual presentation led to better performance than did auditory-only presentation). The benefit of clear speech was greater for words occurring in the middle of sentences than for words at either the beginning or end of sentences for both auditory-only and auditory-visual presentation, whereas the greatest benefit from supplying visual cues was for words at the end of sentences spoken both clearly and conversationally. The total benefit from speaking clearly and supplying visual cues was equal to the sum of each of these effects. Overall, the results suggest that speaking clearly and providing visual speech information provide complementary (rather than redundant) information.


2021 ◽  
Vol 64 (10) ◽  
pp. 4014-4029
Author(s):  
Kathy R. Vander Werff ◽  
Christopher E. Niemczak ◽  
Kenneth Morse

Purpose Background noise has been categorized as energetic masking due to spectrotemporal overlap of the target and masker on the auditory periphery or informational masking due to cognitive-level interference from relevant content such as speech. The effects of masking on cortical and sensory auditory processing can be objectively studied with the cortical auditory evoked potential (CAEP). However, whether effects on neural response morphology are due to energetic spectrotemporal differences or informational content is not fully understood. The current multi-experiment series was designed to assess the effects of speech versus nonspeech maskers on the neural encoding of speech information in the central auditory system, specifically in terms of the effects of speech babble noise maskers varying by talker number. Method CAEPs were recorded from normal-hearing young adults in response to speech syllables in the presence of energetic maskers (white or speech-shaped noise) and varying amounts of informational maskers (speech babble maskers). The primary manipulation of informational masking was the number of talkers in speech babble, and results on CAEPs were compared to those of nonspeech maskers with different temporal and spectral characteristics. Results Even when nonspeech noise maskers were spectrally shaped and temporally modulated to speech babble maskers, notable changes in the typical morphology of the CAEP in response to speech stimuli were identified in the presence of primarily energetic maskers and speech babble maskers with varying numbers of talkers. Conclusions While differences in CAEP outcomes did not reach significance by number of talkers, neural components were significantly affected by speech babble maskers compared to nonspeech maskers. These results suggest an informational masking influence on neural encoding of speech information at the sensory cortical level of auditory processing, even without active participation on the part of the listener.


2020 ◽  
Vol 28 (4) ◽  
pp. 317-324
Author(s):  
Elena Zakirovna Kireeva ◽  

Review on «Dictionary of response remarks in Russian dialogical speech» by V. T. Bondarenko. The dictionary is based on a concept that is developed from the idea of dialogism of human consciousness. The object of study is response remarks, i.e. words and phraseological units whose illocutionary purpose is to respond to a word or phrase of another participant of the dialogue. They are characterized by stability in language and reproducibility in speech. Responses are defined as performative signs: they are used to express the psychological state (reaction) of the speaker, caused by an initiative phrase or ”hook”-word. The paper describes macro- and microstructure of the dictionary, characterizes the semantic and syntactic aspects of the response remarks, and enumerates their functions. The author of the review shows a number of ways to use the dictionary. Responses are linked to typical situations and everyday situations of communication (meeting, acquaintance, addressing, attracting attention, etc.), to conversation topics, and therefore, are of interest to researchers dealing with genres of oral speech. Since the responses are connected with the stereotypes of thinking, behavior and mental reactions of Russians, their research is important for ethnolinguists. The dictionary data can enrich linguistic and cultural studies of cultural concepts. Due to the playful (humorous) function inherent to responses, they may be of interest when studying the essence of the comic. The dictionary materials give a systematic idea of the expression of the comic in the Russian language. The open evaluability of response remarks makes them a unique research material for studying the categories of axiology, evaluability, and textual modality. The analysis of the context of responses, the system of marks and illustrations is valuable for researchers of speech culture and speech etiquette. It will be fruitful for psycholinguists developing a theory of reactivity. The dictionary has a wide range of response variations, so it is of great importance for phraseologists who study the variation of set phrases. Studying the response remarks will be useful to researchers of children’s speech as vocabulary, syntax, rhythm of response replicas, and images in them are organical for the child’s perception and can be easily reproduced. For gender studies of language, the research of these units is important because they allow you to get information about gender characteristics, and marks and illustrations – to compare the tactics of speech behavior of men and women. The dictionary has a great educational value for any person, because thanks to the non-standard and unusual material of the dictionary, everyone can enrich their speech.


Author(s):  
Jun-Li Xu ◽  
Cecilia Riccioli ◽  
Ana Herrero-Langreo ◽  
Aoife Gowen

Deep learning (DL) has recently achieved considerable successes in a wide range of applications, such as speech recognition, machine translation and visual recognition. This tutorial provides guidelines and useful strategies to apply DL techniques to address pixel-wise classification of spectral images. A one-dimensional convolutional neural network (1-D CNN) is used to extract features from the spectral domain, which are subsequently used for classification. In contrast to conventional classification methods for spectral images that examine primarily the spectral context, a three-dimensional (3-D) CNN is applied to simultaneously extract spatial and spectral features to enhance classificationaccuracy. This tutorial paper explains, in a stepwise manner, how to develop 1-D CNN and 3-D CNN models to discriminate spectral imaging data in a food authenticity context. The example image data provided consists of three varieties of puffed cereals imaged in the NIR range (943–1643 nm). The tutorial is presented in the MATLAB environment and scripts and dataset used are provided. Starting from spectral image pre-processing (background removal and spectral pre-treatment), the typical steps encountered in development of CNN models are presented. The example dataset provided demonstrates that deep learning approaches can increase classification accuracy compared to conventional approaches, increasing the accuracy of the model tested on an independent image from 92.33 % using partial least squares-discriminant analysis to 99.4 % using 3-CNN model at pixel level. The paper concludes with a discussion on the challenges and suggestions in the application of DL techniques for spectral image classification.


2010 ◽  
Vol 21 (7) ◽  
pp. 914-919 ◽  
Author(s):  
Elizabeth J. Meinz ◽  
David Z. Hambrick

Deliberate practice—that is, engagement in activities specifically designed to improve performance in a domain—is strongly predictive of performance in domains such as music and sports. It has even been suggested that deliberate practice is sufficient to account for expert performance. Less clear is whether basic abilities, such as working memory capacity (WMC), add to the prediction of expert performance, above and beyond deliberate practice. In evaluating participants having a wide range of piano-playing skill (novice to expert), we found that deliberate practice accounted for nearly half of the total variance in piano sight-reading performance. However, there was an incremental positive effect of WMC, and there was no evidence that deliberate practice reduced this effect. Evidence indicates that WMC is highly general, stable, and heritable, and thus our results call into question the view that expert performance is solely a reflection of deliberate practice.


Languages ◽  
2018 ◽  
Vol 3 (4) ◽  
pp. 38 ◽  
Author(s):  
Arzu Yordamlı ◽  
Doğu Erdener

This study aimed to investigate how individuals with bipolar disorder integrate auditory and visual speech information compared to healthy individuals. Furthermore, we wanted to see whether there were any differences between manic and depressive episode bipolar disorder patients with respect to auditory and visual speech integration. It was hypothesized that the bipolar group’s auditory–visual speech integration would be weaker than that of the control group. Further, it was predicted that those in the manic phase of bipolar disorder would integrate visual speech information more robustly than their depressive phase counterparts. To examine these predictions, a McGurk effect paradigm with an identification task was used with typical auditory–visual (AV) speech stimuli. Additionally, auditory-only (AO) and visual-only (VO, lip-reading) speech perceptions were also tested. The dependent variable for the AV stimuli was the amount of visual speech influence. The dependent variables for AO and VO stimuli were accurate modality-based responses. Results showed that the disordered and control groups did not differ in AV speech integration and AO speech perception. However, there was a striking difference in favour of the healthy group with respect to the VO stimuli. The results suggest the need for further research whereby both behavioural and physiological data are collected simultaneously. This will help us understand the full dynamics of how auditory and visual speech information are integrated in people with bipolar disorder.


Author(s):  
Yong Zhang ◽  
Baosheng Jin ◽  
Wenqi Zhong

Fluidization, mixing and segregation of a biomass-sand mixture in a 3D gas-fluidized bed have been investigated by means of visual observation, pressure fluctuation analysis and the bed-frozen method. Three types of mixtures are considered, in which biomass is a thin long stalk, and sand belongs to the Geldart B category. Experiments are carried out in a segmented fluidized bed equipped with multiple pressure transducers. Three initial packing conditions and two experiment procedures are used. The fluidization velocity varies to cover a wide range. Results show that in the local fluidization region, the mixing and segregation patterns are sensitive to the initial packing condition. In the case of a fully segregated state with biomass at the bottom, the bed inversion can be significantly observed due to the great segregation tendency of biomass. Further analyses indicate that the mixing ratio exerts a subtle influence on the competition between mixing and segregation by disturbing the coalescence and break-up of the bubble. In addition, the pressure fluctuation signal proves to be helpful in understanding the dynamic features of the phenomenology.


2008 ◽  
Vol 275 (1646) ◽  
pp. 2049-2054 ◽  
Author(s):  
Christelle Jozet-Alves ◽  
Julien Modéran ◽  
Ludovic Dickel

Evidence of sex differences in spatial cognition have been reported in a wide range of vertebrate species. Several evolutionary hypotheses have been proposed to explain these differences. The one best supported is the range size hypothesis that links spatial ability to range size. Our study aimed to determine whether male cuttlefish ( Sepia officinalis ; cephalopod mollusc) range over a larger area than females and whether this difference is associated with a cognitive dimorphism in orientation abilities. First, we assessed the distance travelled by sexually immature and mature cuttlefish of both sexes when placed in an open field (test 1). Second, cuttlefish were trained to solve a spatial task in a T-maze, and the spatial strategy preferentially used (right/left turn or visual cues) was determined (test 2). Our results showed that sexually mature males travelled a longer distance in test 1, and were more likely to use visual cues to orient in test 2, compared with the other three groups. This paper demonstrates for the first time a cognitive dimorphism between sexes in an invertebrate. The data conform to the predictions of the range size hypothesis. Comparative studies with other invertebrate species might lead to a better understanding of the evolution of cognitive dimorphism.


Sign in / Sign up

Export Citation Format

Share Document