spectral cues
Recently Published Documents


TOTAL DOCUMENTS

192
(FIVE YEARS 25)

H-INDEX

32
(FIVE YEARS 2)

2021 ◽  
pp. 002383092098682
Author(s):  
Vladimir Kulikov

The current study investigates multiple acoustic cues–voice onset time (VOT), spectral center of gravity (SCG) of burst, pitch (F0), and frequencies of the first (F1) and second (F2) formants at vowel onset—associated with phonological contrasts of voicing and emphasis in production of Arabic coronal stops. The analysis of the acoustic data collected from eight native speakers of the Qatari dialect showed that the three stops form three distinct modes on the VOT scale: [d] is (pre)voiced, voiceless [t] is aspirated, and emphatic [ṭ] is voiceless unaspirated. The contrast is also maintained in spectral cues. Each cue influences production of coronal stops while their relevance to phonological contrasts varies. VOT was most relevant for voicing, but F2 was mostly associated with emphasis. The perception experiment revealed that listeners were able to categorize ambiguous tokens correctly and compensate for phonological contrasts. The listeners’ results were used to evaluate three categorization models to predict the intended category of a coronal stop: a model with unweighted and unadjusted cues, a model with weighted cues compensating for phonetic context, and a model with weighted cues compensating for the voicing and emphasis contrasts. The findings suggest that the model with phonological compensation performed most similar to human listeners both in terms of accuracy rate and error pattern.


Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 227
Author(s):  
Te Meng Ting ◽  
Nur Syazreen Ahmad ◽  
Patrick Goh ◽  
Junita Mohamad-Saleh

In this work, a binaural model resembling the human auditory system was built using a pair of three-dimensional (3D)-printed ears to localize a sound source in both vertical and horizontal directions. An analysis on the proposed model was firstly conducted to study the correlations between the spatial auditory cues and the 3D polar coordinate of the source. Apart from the estimation techniques via interaural and spectral cues, the property from the combined direct and reverberant energy decay curve is also introduced as part of the localization strategy. The preliminary analysis reveals that the latter provides a much more accurate distance estimation when compared to approximations via sound pressure level approach, but is alone not sufficient to disambiguate the front-rear confusions. For vertical localization, it is also shown that the elevation angle can be robustly encoded through the spectral notches. By analysing the strengths and shortcomings of each estimation method, a new algorithm is formulated to localize the sound source which is also further improved by cross-correlating the interaural and spectral cues. The proposed technique has been validated via a series of experiments where the sound source was randomly placed at 30 different locations in an outdoor environment up to a distance of 19 m. Based on the experimental and numerical evaluations, the localization performance has been significantly improved with an average error of 0.5 m from the distance estimation and a considerable reduction of total ambiguous points to 3.3%.


Acta Acustica ◽  
2021 ◽  
Vol 5 ◽  
pp. 59
Author(s):  
Robert Baumgartner ◽  
Piotr Majdak

Under natural conditions, listeners perceptually attribute sounds to external objects in their environment. This core function of perceptual inference is often distorted when sounds are produced via hearing devices such as headphones or hearing aids, resulting in sources being perceived unrealistically close or even inside the head. Psychoacoustic studies suggest a mixed role of various monaural and interaural cues contributing to the externalization process. We developed a model framework for perceptual externalization able to probe the contribution of cue-specific expectation errors and to contrast dynamic versus static strategies for combining those errors within static listening environments. Effects of reverberation and visual information were not considered. The model was applied to various acoustic distortions as tested under various spatially static conditions in five previous experiments. Most accurate predictions were obtained for the combination of monaural and interaural spectral cues with a fixed relative weighting (approximately 60% of monaural and 40% of interaural). That model version was able to reproduce the externalization rating of the five experiments with an average error of 12% (relative to the full rating scale). Further, our results suggest that auditory externalization in spatially static listening situations underlies a fixed weighting of monaural and interaural spectral cues, rather than a dynamic selection of those auditory cues.


2020 ◽  
Vol 123 ◽  
pp. 10-25
Author(s):  
Arun Baby ◽  
Jeena J. Prakash ◽  
Aswin Shanmugam Subramanian ◽  
Hema A. Murthy

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Neal P Fox ◽  
Matthew Leonard ◽  
Matthias J Sjerps ◽  
Edward F Chang

In speech, listeners extract continuously-varying spectrotemporal cues from the acoustic signal to perceive discrete phonetic categories. Spectral cues are spatially encoded in the amplitude of responses in phonetically-tuned neural populations in auditory cortex. It remains unknown whether similar neurophysiological mechanisms encode temporal cues like voice-onset time (VOT), which distinguishes sounds like /b/ and/p/. We used direct brain recordings in humans to investigate the neural encoding of temporal speech cues with a VOT continuum from /ba/ to /pa/. We found that distinct neural populations respond preferentially to VOTs from one phonetic category, and are also sensitive to sub-phonetic VOT differences within a population’s preferred category. In a simple neural network model, simulated populations tuned to detect either temporal gaps or coincidences between spectral cues captured encoding patterns observed in real neural data. These results demonstrate that a spatial/amplitude neural code underlies the cortical representation of both spectral and temporal speech cues.


2020 ◽  
Vol 148 (2) ◽  
pp. 614-626 ◽  
Author(s):  
Brian K. Branstetter ◽  
Kaitlin R. Van Alstyne ◽  
Madelyn G. Strahan ◽  
Megan N. Tormey ◽  
Teri Wu ◽  
...  

Author(s):  
Sovon Dhara ◽  
Indranil Chatterjee ◽  
Himansu Kumar ◽  
Susmi Pani

<p class="abstract"><strong>Background:</strong> Speech recognition in a modulating noise background can be facilitated by a process attributable to comodulation masking release (CMR). CMR is usually assumed to depend on comparisons of the outputs of different auditory filters. There was an immense importance to study to find the CMR effect in children with and without dyslexia.</p><p class="abstract"><strong>Methods:</strong> The study was to find the CMR effect in children with and without dyslexia. The current research was carried out through five steps i.e. auditory attention task stimuli preparation, auditory performance test, CMR stimuli preparation, CMR task and statistical analysis. Through these processes for measuring the CMR was done for the children with and without dyslexia. All the data were tabulated and statistically computed for the analysis of the data. SPSS software version 16 was used for the statistical analysis of the data.  </p><p class="abstract"><strong>Results:</strong> Independent t-test was used for the statistical analysis while the comparison between groups. Paired t- test was used for the statistical analysis while the comparison within the group at 95% confidence interval. These results indicated that the amount effect of CMR is greater in children with dyslexia. There was not a significant difference of CMR between the children with and without dyslexia to the effect of CMR was not significantly different between the ears in children without dyslexia and with dyslexia.</p><p class="abstract"><strong>Conclusions:</strong> The present study indicates that children with dyslexia have selective inability to use the temporal and spectral cues necessary for signal extraction in CMR.</p>


2020 ◽  
Author(s):  
Andrew Francl ◽  
Josh H. McDermott

AbstractMammals localize sounds using information from their two ears. Localization in real-world conditions is challenging, as echoes provide erroneous information, and noises mask parts of target sounds. To better understand real-world localization we equipped a deep neural network with human ears and trained it to localize sounds in a virtual environment. The resulting model localized accurately in realistic conditions with noise and reverberation, outperforming alternative systems that lacked human ears. In simulated experiments, the network exhibited many features of human spatial hearing: sensitivity to monaural spectral cues and interaural time and level differences, integration across frequency, and biases for sound onsets. But when trained in unnatural environments without either reverberation, noise, or natural sounds, these performance characteristics deviated from those of humans. The results show how biological hearing is adapted to the challenges of real-world environments and illustrate how artificial neural networks can extend traditional ideal observer models to real-world domains.


Sign in / Sign up

Export Citation Format

Share Document