spectral masking
Recently Published Documents


TOTAL DOCUMENTS

43
(FIVE YEARS 5)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Vol 12 ◽  
Author(s):  
Michel Bürgel ◽  
Lorenzo Picinali ◽  
Kai Siedenburg

Listeners can attend to and track instruments or singing voices in complex musical mixtures, even though the acoustical energy of sounds from individual instruments may overlap in time and frequency. In popular music, lead vocals are often accompanied by sound mixtures from a variety of instruments, such as drums, bass, keyboards, and guitars. However, little is known about how the perceptual organization of such musical scenes is affected by selective attention, and which acoustic features play the most important role. To investigate these questions, we explored the role of auditory attention in a realistic musical scenario. We conducted three online experiments in which participants detected single cued instruments or voices in multi-track musical mixtures. Stimuli consisted of 2-s multi-track excerpts of popular music. In one condition, the target cue preceded the mixture, allowing listeners to selectively attend to the target. In another condition, the target was presented after the mixture, requiring a more “global” mode of listening. Performance differences between these two conditions were interpreted as effects of selective attention. In Experiment 1, results showed that detection performance was generally dependent on the target’s instrument category, but listeners were more accurate when the target was presented prior to the mixture rather than the opposite. Lead vocals appeared to be nearly unaffected by this change in presentation order and achieved the highest accuracy compared with the other instruments, which suggested a particular salience of vocal signals in musical mixtures. In Experiment 2, filtering was used to avoid potential spectral masking of target sounds. Although detection accuracy increased for all instruments, a similar pattern of results was observed regarding the instrument-specific differences between presentation orders. In Experiment 3, adjusting the sound level differences between the targets reduced the effect of presentation order, but did not affect the differences between instruments. While both acoustic manipulations facilitated the detection of targets, vocal signals remained particularly salient, which suggest that the manipulated features did not contribute to vocal salience. These findings demonstrate that lead vocals serve as robust attractor points of auditory attention regardless of the manipulation of low-level acoustical cues.


2021 ◽  
Vol 7 (2) ◽  
pp. 37-40
Author(s):  
Stephan Göb ◽  
Theresa Ida Götz ◽  
Thomas Wittenberg

Abstract Multispectral imaging devices incorporating up to 256 different spectral channels have recently become available for various healthcare applications, as e.g. laparoscopy, gastroscopy, dermatology or perfusion imaging for wound analysis. Currently, the use of such devices is limited due to very high investment costs and slow capture times. To compensate these shortcomings, single sensors with spectral masking on the pixel level have been proposed. Hence, adequate spectral reconstruction methods are needed. Within this work, two deep convolutional neural networks (DCNN) architectures for spectral image reconstruction from single sensors are compared with each other. Training of the networks is based on a huge collection of different MSI imagestacks, which have been subsampled, simulating 16-channel single sensors with spectral masking. We define a training, validation and test set (‘HITgoC’) resulting in 351 training (631.128 sub-images), 99 validation (163.272 sub-images) and 51 test images. For the application in the field of neurosurgery an additional testing set of 36 image stacks from the Nimbus data collection is used, depicting MSI brain data during open surgery. Two DCNN architectures were compared to bilinear interpolation (BI) and an intensity difference (ID) algorithm. The DCNNs (ResNet-Shinoda) were trained on HITgoC and consist of a preprocessing step using BI or ID and a refinement part using a ResNet structure. Similarity measures used were PSNR, SSIM and MSE between predicted and reference images. We calculated the similarity measures for HitgoC and Nimbus data and determined differences of the mean similarity measure values achieved with the ResNet-ID and baseline algorithms such as BI algorithm and ResNet-Shinoda. The proposed method achieved better results against BI in SSIM (.0644 vs. .0252), PSNR (15.3 dB vs. 9.1 dB) and 1-MSE*100 (.0855 vs. .0273) and compared to ResNet-Shinoda in SSIM (.0103 vs. .0074), PSNR (3.8 dB vs. 3.6 dB) and 1-MSE*100 (.0075 vs. .0047) for HITgoC/Nimbus. In this study, significantly better results for spectral reconstruction in MSI images of open neurosurgery was achieved using a combination of ID-interpolation and ResNet structure compared to standard methods.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 160581-160595
Author(s):  
Nasir Saleem ◽  
Muhammad Irfan Khattak ◽  
Muath Al-Hasan ◽  
Abdul Baseer Qazi

Author(s):  
Timo Gerkmann ◽  
Emmanuel Vincent
Keyword(s):  

Icarus ◽  
2016 ◽  
Vol 271 ◽  
pp. 387-399 ◽  
Author(s):  
Selby Cull-Hearth ◽  
Alexis van Venrooy ◽  
M. Caroline Clark ◽  
Adriana Cvitkovic

Sign in / Sign up

Export Citation Format

Share Document