scholarly journals Kõnetaju kategoriaalsus ehk hüpotees sellest, kuidas me keelelisi üksusi tajume

Author(s):  
Nele Salveste

Erinevate häälikute laad meie igapäevases kõnes varieerub tugevalt, kuid häälduse varieeruvus ei ole enamasti kõneeristusele takistuseks. See annab alust oletada, et kõnetaju on välja arendanud süsteemi, millega tuvastada foneeme väga suure varieeruvusega kõnesignaalist. See süsteem tegeleb kõne varieeruvusega nii tõhusalt ja kiiresti, et me ei ole sellest enamasti teadlikud. Seda süsteemi võiks nimetada kategoriaalseks tajuks (ingl Categorical Perception), kuid kuna taju on uurimisele üksnes kaudselt kättesaadav, siis tähistab see termin pigem eksperimentaalset mudelit või meetodit, millega uuritakse taju võimet foneeme kõnesignaalist eristada. (Schouten jt 2003) Käesolevas artiklis arutatakse kategoriaalse taju kui mudeli ja katsemeetodi üle, mille teoreetilised lähtekohad on olnud nii muudes keeltes kui eesti keeles läbi viidud tajukatsete ülesehituse ja järelduste eeldusteks.Categorical perception or the hypothesis of how we perceive linguistic units. The acoustic signal of everyday speech is very variable, but it seldom distracts the normal speech communication. This motivates the hypothesis that the speech perception must have developed a special mechanism for extracting phonemes from highly variable speech signal. This mechanism extracts phonemes so efficiently and quickly that we are often unaware of it. We would like to call this mechanism “categorical perception of speech”, but since the perceptual processes are only indirectly accessible for investigation, the term refers rather to a theoretical model or an experimental method for investigating our perceptual ability to distinguish phonemes from the speech signal so efficiently (Schouten et al. 2003). In this paper the Categorical Perception as an experimental method and its theoretical statements will be discussed in connection to perception experiments and findings in other languages as well as in Estonian language.

1963 ◽  
Vol 6 (3) ◽  
pp. 207-222 ◽  
Author(s):  
J. M. Pickett ◽  
B. Horenstein Pickett

Tests of tactual speech perception were conducted using a special frequency-analyzing vocoder. The vocoder presented a running frequency analysis of speech mapped into a spatial array of tactual vibrations which were applied to the fingers of the receiving subject. Ten vibrators were used, one for each finger. The position of a vibrator represented a given frequency region of speech energy; the total range covered was 210 to 7 700 cps; all the vibrations had a frequency of 300 cps; the vibration amplitudes represented the energy distribution over the various frequencies. Discrimination and identification tests were performed with various sets of test vowels; consonant discrimination tests were performed with certain consonants including those that might be difficult to lipread. Performance with vowels appeared to be related to formant structure and duration as measured on the test vowels, and to tactual masking effects. Consonant discrimination was good between stops and continuants; consonant features of nasality, voicing, and affrication were also discriminated to some extent. It is concluded that the skin offers certain capacities for transmitting speech information which may be used to complement speech communication where only an impoverished speech signal is normally received. This research was conducted at the Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, Sweden.


2014 ◽  
Vol 44 (3) ◽  
pp. 283-296 ◽  
Author(s):  
Michael Ashby ◽  
Joanna Przedlacka

The autocorrelation function, a measure of regularity in the speech signal, is applied in demarcating the seemingly diffuse intervals of glottalization which accompany or replace voiceless oral stops in elicited recordings from 22 young speakers of Southern British English. It is shown that a local minimum in autocorrelation characterizes almost all instances heard as intervocalic glottal stops; an annotation procedure is developed and used to gather data on glottalization gestures, including duration, f0, energy and autocorrelation. The same measure is used to assess regularity of vocal fold vibration in an interval just prior to the formation of the total closure for instances of syllable-final /t/, and confirms significantly lower autocorrelation in a group auditorily judged ‘pre-glottalized’. Implications are considered both for normal speech perception and for expert phonetic judgments.


Author(s):  
Shibanee Dash . ◽  
Mihir Narayan Mohanty .

Modern wireless communication has gained a improved position as compared to previous time. Similarly, speech communication is the major focus area of research in respective applications. Many developments are done in this field. In this work, we have chosen the OFDM modulation based communication system, as it has importance in both licensed and unlicensed wireless communication platform. The voice signal is passed though the proposed model to obtain at the receiver end. Due to different circumstances, the signal may be corrupted partially at the user end. Authors try to achieve a better signal for reception using a neural network model of RBFN. The parameters are chosen for the RBFN model, as energy, ZCR, ACF, and fundamental frequency of the speech signal. In one part these parameters have eligibility to eliminate noise partially, where as in other part the RBFN model with these parameters proves its efficacy for both noisy speech signals with noisy channel as Gaussian channel. The efficiency of OFDM model is verified in terms of symbol error rate and the transmitted speech signal is evaluated in term of SNR that shows the reduction of noise. For visual inspection, a sample of signal, noisy signal and received signal is also shown. The experiment is performed with 5dB, 10dB, 15dB noise levels. The result proves the performance of RBFN model as the filter.The performance is measured as the listener’s voice in each condition. The results show that, at the time of the voice in noise environment, proposed technique improves the intelligibility on speech quality.


Author(s):  
Sehchang Hah

The objective of this experiment was to quantify and localize the effects of wearing the nuclear, biological, and chemical (NBC) M40 protective mask and hood on speech production and perception. A designated speaker's vocalizations of 192 monosyllables while wearing an M40 mask with hood were digitized and used as speech stimuli. Another set of speech stimuli was produced by recording the same individual's vocalizing the same monosyllables without the mask and hood. Participants listened to one set of stimuli during two sessions, one session while wearing an M40 mask with hood and another session without the mask and hood. The results showed that wearing the mask with hood gave most detrimental effects on the sustention dimension acoustically for both speech perception and production. The results also showed that wearing it was detrimental on vocalizing and listening to fricatives and unvoiced-stops. These results may be due to the muffling effect of the voicemitter in speech production and the filtering effects of the voicemitter and the hood material on high frequency components during both speech production and perception. This information will be useful for designing better masks and hoods. This methodology also can be used to evaluate other speech communication systems.


Author(s):  
Rajinder Koul ◽  
James Dembowski

The purpose of this chapter is to review research conducted over the past two decades on the perception of synthetic speech by individuals with intellectual, language, and hearing impairments. Many individuals with little or no functional speech as a result of intellectual, language, physical, or multiple disabilities rely on non-speech communication systems to augment or replace natural speech. These systems include Speech Generating Devices (SGDs) that produce synthetic speech upon activation. Based on this review, the two main conclusions are evident. The first is that persons with intellectual and/or language impairment demonstrate greater difficulties in processing synthetic speech than their typical matched peers. The second conclusion is that repeated exposure to synthetic speech allows individuals with intellectual and/or language disabilities to identify synthetic speech with increased accuracy and speed. This finding is of clinical significance as it indicates that individuals who use SGDs become more proficient at understanding synthetic speech over a period of time.


1998 ◽  
Vol 21 (2) ◽  
pp. 275-275 ◽  
Author(s):  
Dominic W. Massaro

Sussman et al. describe an ecological property of the speech signal that is putatively functional in perception. An important issue, however, is whether their putative cue is an emerging feature or whether the second formant (F2) onset and the F2 vowel actually provide independent cues to perceptual categorization. Regardless of the outcome of this issue, an important goal of speech research is to understand how multiple cues are evaluated and integrated to achieve categorization.


2013 ◽  
Vol 284-287 ◽  
pp. 2867-2871 ◽  
Author(s):  
Jui Feng Yeh ◽  
Min Da Kuo ◽  
Zhong Hua Hsu

Packet loss is one of the most essential problems in speech communication. It will cause the information loss and uncomfortable for listeners in voice over IP. This investment proposed an approach based on waveform similarity measure using overlap-and-Add algorithm. The waveform similarity overlap-and-add (WSOLA) technique is an effective algorithm to deal with packet loss concealment (PLC). For real-time time communication, the WSOLA algorithm is widely used to deal with the length adaptation and packet loss concealment of speech signal. Time scale modification of audio signal is one of the most essential research topics in data communication, especially in voice of IP (VoIP). Herein, we proposed the dual-side WSOLA that is derived by standard WSOLA. Instead of only exploitation one direction speech data, the proposed method will reconstruct the lost voice data according to the preceding and cascading voice. The dual-side WSOLA can use both the past and future speech signal waveform to reconstruction voice waveform of lost packet. The evaluations show that the quality of the reconstructed speech signal of the dual-side WSOLA is higher than that of the standard WSOLA and GWSOLA on different packet loss rate and length using the metrics: PESQ and MOS. The significant improvement is obtained by dual side information in the proposed method. The proposed dual-side waveform similarity overlap-and-add (DSWSOLA) outperforms the traditional approaches.


Author(s):  
Xiaolong Li ◽  
Renyang He ◽  
Tao Meng ◽  
Shimin Zhang

Caliper pigs are widely used for measuring pipeline internal geometry and detecting anomalies, such as dents, wrinkles and flat spots. Due to its excellent transport capacity, caliper pig is more widely used in subsea gas pipelines than smart pig, such as Magnetic Flux Leakage (MFL) pig or Ultrasonic Test (UT) pig. In this article, the bouncing process of detection arm of caliper pig across convex defect is studied. And then, the influence of the collision between the detection arm and deformation defect, as well as the defect shape on the bounce, is discussed. Based on that, a bouncing theoretical model of the detection arm is developed for analyzing the bouncing phenomenon. Furthermore,the bouncing process that the detection arm moves across the convex defect with different velocities and different initial spring forces is studied by experimental method. The bouncing model calculation value is approximately equal with the experimental value, verifying the validity of the bouncing model. The bouncing model has a great significance for caliper pig evaluating the pipeline convex defect.


Sign in / Sign up

Export Citation Format

Share Document