scholarly journals Dichotic presentation of speech signal with critical band filtering for improving speech perception

Author(s):  
D.S. Chaudhari ◽  
P.C. Pandey
1963 ◽  
Vol 6 (3) ◽  
pp. 207-222 ◽  
Author(s):  
J. M. Pickett ◽  
B. Horenstein Pickett

Tests of tactual speech perception were conducted using a special frequency-analyzing vocoder. The vocoder presented a running frequency analysis of speech mapped into a spatial array of tactual vibrations which were applied to the fingers of the receiving subject. Ten vibrators were used, one for each finger. The position of a vibrator represented a given frequency region of speech energy; the total range covered was 210 to 7 700 cps; all the vibrations had a frequency of 300 cps; the vibration amplitudes represented the energy distribution over the various frequencies. Discrimination and identification tests were performed with various sets of test vowels; consonant discrimination tests were performed with certain consonants including those that might be difficult to lipread. Performance with vowels appeared to be related to formant structure and duration as measured on the test vowels, and to tactual masking effects. Consonant discrimination was good between stops and continuants; consonant features of nasality, voicing, and affrication were also discriminated to some extent. It is concluded that the skin offers certain capacities for transmitting speech information which may be used to complement speech communication where only an impoverished speech signal is normally received. This research was conducted at the Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden.


1998 ◽  
Vol 21 (2) ◽  
pp. 275-275 ◽  
Author(s):  
Dominic W. Massaro

Sussman et al. describe an ecological property of the speech signal that is putatively functional in perception. An important issue, however, is whether their putative cue is an emerging feature or whether the second formant (F2) onset and the F2 vowel actually provide independent cues to perceptual categorization. Regardless of the outcome of this issue, an important goal of speech research is to understand how multiple cues are evaluated and integrated to achieve categorization.


Author(s):  
Nele Salveste

Erinevate häälikute laad meie igapäevases kõnes varieerub tugevalt, kuid häälduse varieeruvus ei ole enamasti kõneeristusele takistuseks. See annab alust oletada, et kõnetaju on välja arendanud süsteemi, millega tuvastada foneeme väga suure varieeruvusega kõnesignaalist. See süsteem tegeleb kõne varieeruvusega nii tõhusalt ja kiiresti, et me ei ole sellest enamasti teadlikud. Seda süsteemi võiks nimetada kategoriaalseks tajuks (ingl Categorical Perception), kuid kuna taju on uurimisele üksnes kaudselt kättesaadav, siis tähistab see termin pigem eksperimentaalset mudelit või meetodit, millega uuritakse taju võimet foneeme kõnesignaalist eristada. (Schouten jt 2003) Käesolevas artiklis arutatakse kategoriaalse taju kui mudeli ja katsemeetodi üle, mille teoreetilised lähtekohad on olnud nii muudes keeltes kui eesti keeles läbi viidud tajukatsete ülesehituse ja järelduste eeldusteks.Categorical perception or the hypothesis of how we perceive linguistic units. The acoustic signal of everyday speech is very variable, but it seldom distracts the normal speech communication. This motivates the hypothesis that the speech perception must have developed a special mechanism for extracting phonemes from highly variable speech signal. This mechanism extracts phonemes so efficiently and quickly that we are often unaware of it. We would like to call this mechanism “categorical perception of speech”, but since the perceptual processes are only indirectly accessible for investigation, the term refers rather to a theoretical model or an experimental method for investigating our perceptual ability to distinguish phonemes from the speech signal so efficiently (Schouten et al. 2003). In this paper the Categorical Perception as an experimental method and its theoretical statements will be discussed in connection to perception experiments and findings in other languages as well as in Estonian language.


2011 ◽  
Vol 403-408 ◽  
pp. 970-975
Author(s):  
P.A. Dhulekar ◽  
S.L. Nalbalwar ◽  
J.J. Chopade

Simultaneous masking is when a sound is made inaudible by a masker, a noise or unwanted sound of the same duration as the original sound. An innovative approach is investigated for speech processing in cochlea. Differently from the traditional filter-bank spectral analysis strategies, the proposed method analyses the speech signal by means of the wavelet packets. Splitting the speech signal by filtering and down sampling at each decomposition level, using wavelet packets with different wavelet functions, helps to shrink the effect of simultaneous masking. The performance of the proposed method is experimentally evaluated with vowel-consonant-vowel -syllables for fifteen English consonants. The dichotic presentation of processed speech signals effectively reduces the simultaneous masking through which it improves auditory perception.


2020 ◽  
Author(s):  
Emmanuel Biau ◽  
Benjamin G. Schultz ◽  
Thomas C. Gunter ◽  
Sonja A. Kotz

ABSTRACTDuring multimodal speech perception, slow delta oscillations (~1 - 3 Hz) in the listener’s brain synchronize with speech signal, likely reflecting signal decomposition at the service of comprehension. In particular, fluctuations imposed onto the speech amplitude envelope by a speaker’s prosody seem to temporally align with articulatory and body gestures, thus providing two complementary sensations to the speech signal’s temporal structure. Further, endogenous delta oscillations in the left motor cortex align with speech and music beat, suggesting a role in the temporal integration of (quasi)-rhythmic stimulations. We propose that delta activity facilitates the temporal alignment of a listener’s oscillatory activity with the prosodic fluctuations in a speaker’s speech during multimodal speech perception. We recorded EEG responses in an audiovisual synchrony detection task while participants watched videos of a speaker. To test the temporal alignment of visual and auditory prosodic features, we filtered the speech signal to remove verbal content. Results confirm (i) that participants accurately detected audiovisual synchrony, and (ii) greater delta power in left frontal motor regions in response to audiovisual asynchrony. The latter effect correlated with behavioural performance, and (iii) decreased delta-beta coupling in the left frontal motor regions when listeners could not accurately integrate visual and auditory prosodies. Together, these findings suggest that endogenous delta oscillations align fluctuating prosodic information conveyed by distinct sensory modalities onto a common temporal organisation in multimodal speech perception.


1985 ◽  
Vol 50 (1) ◽  
pp. 60-65 ◽  
Author(s):  
John Greer Clark

Past investigations of alaryngeal speech intelligibility have focused on comparative intelligibility as perceived by young normally hearing adults. However, the spouses and social companions of laryngectomees may have significantly different auditory capabilities compared to young listeners. This report presents a comparison of alaryngeal and laryngeal speech identification performance for a group of young normally hearing listeners and a group of older adult listeners representative of the age of the laryngectomee's social companions. The speech signals investigated included normal laryngeal speech, artificial larynx speech, traditional esophageal speech, and tracheoesophageal speech. The results obtained reveal not only differences in speech signals but also a difference in the proficiency of speech perception for the two groups, favoring the younger listeners. The results of the speech identification measures in the presence of auditory competition revealed greatest intelligibility for the artificial larynx speech signal and poorest for the tracheoesophageal speech signal.


Sign in / Sign up

Export Citation Format

Share Document