audio information
Recently Published Documents


TOTAL DOCUMENTS

236
(FIVE YEARS 58)

H-INDEX

10
(FIVE YEARS 2)

Author(s):  
Elke B. Lange ◽  
Jens Fünderich ◽  
Hartmut Grimm

AbstractWe investigated how visual and auditory information contributes to emotion communication during singing. Classically trained singers applied two different facial expressions (expressive/suppressed) to pieces from their song and opera repertoire. Recordings of the singers were evaluated by laypersons or experts, presented to them in three different modes: auditory, visual, and audio–visual. A manipulation check confirmed that the singers succeeded in manipulating the face while keeping the sound highly expressive. Analyses focused on whether the visual difference or the auditory concordance between the two versions determined perception of the audio–visual stimuli. When evaluating expressive intensity or emotional content a clear effect of visual dominance showed. Experts made more use of the visual cues than laypersons. Consistency measures between uni-modal and multimodal presentations did not explain the visual dominance. The evaluation of seriousness was applied as a control. The uni-modal stimuli were rated as expected, but multisensory evaluations converged without visual dominance. Our study demonstrates that long-term knowledge and task context affect multisensory integration. Even though singers’ orofacial movements are dominated by sound production, their facial expressions can communicate emotions composed into the music, and observes do not rely on audio information instead. Studies such as ours are important to understand multisensory integration in applied settings.


2022 ◽  
Vol 70 (1) ◽  
pp. 169-206
Author(s):  
Slađan Svrzić ◽  
Julijan Bojanov

Introduction/purpose: To specify the practical application of ECMA355 and ECMA-336 Standards for Q-SIG tunneling and the implementation of mapping functions via the existing IP (Internet Protocol) network of the Serbian Armed Forces (Intranet SAF), in the Private Automatic Telephone Network SAF (PATN SAF), as the main part of the Private telecommunication-information networks of integrated services SAF (PISN SAF). Methods: Description of the implemented solution and analysis of the software parameters of the established transmission SIP route, with the display of the results obtained in the fight with jitter and echo in the network. Results: With such a solution, it was achieved that participants from the peripheral parts of the PISN SAF, which operate on the principle of transmission and circuit switching by TDM (Time Division Multiplexing), can connect with each other via the newly established central IP network SAF (Core network) which operates on the principle of transmission and switching packets with the SIP (Session Initiation Protocol), without losing the functionality of QSIG from the framework of the digital telecommunication network of integrated services ISDN (Integrated Services Digital Network). Conclusion: The article deals with the modern IP PINX (Private Integrated Services Network Exchange) manufactured by Mitel, type MX-ONE Service Node 6.0, which is implemented at the transit level PATN SAF and which successfully implements the process of tunneling Q-SIG through the IP network and the necessary functions for mapping the transmission of tunneled QSIG messages and mapping voice (and other audio) information to VoIP (Voice over IP) communication media streams through that network. Also, the basic elements for its software preparation during the introduction of a new SIP route, with a capacity of 30 IP trunks in a transmission beam realized with 100 Mb/s-T Ethernet, are given, and the fight with the present jitter and echo in the network is described. Finally, the paper presents the experience-based values of the parameters for reducing the influence of jitter and suppressing echo.


2021 ◽  
Author(s):  
Marie Tahon ◽  
Manon Macary ◽  
yannick Estève ◽  
Daniel Luzzati

<div> <div> <div> <p>The goal of our research is to automaticaly retrieve the satisfaction and the frustration in real-life call-center conversations. This study focuses an industrial application in which the customer satisfaction is continuously tracked down to improve customer services. To compensate the lack of large annotated emotional databases, we explore the use of pre-trained speech representations as a form of transfer learning towards AlloSat corpus. Moreover, several studies have pointed out that emotion can be detected not only in speech but also in facial trait, in biological response or in textual information. In the context of telephone conversations, we can break down the audio information into acoustic and linguistic by using the speech signal and its transcription. Our experiments confirms the large gain in performance obtained with the use of pre-trained features. Surprisingly, we found that the linguistic content is clearly the major contributor for the prediction of satisfaction and best generalizes to unseen data. Our experiments conclude to the definitive advantage of using CamemBERT representations, however the benefit of the fusion of acoustic and linguistic modalities is not as obvious. With models learnt on individual annotations, we found that fusion approaches are more robust to the subjectivity of the annotation task. This study also tackles the problem of performances variability and intends to estimate this variability from different views: weights initialization, confidence intervals and annotation subjectivity. A deep analysis on the linguistic content investigates interpretable factors able to explain the high contribution of the linguistic modality for this task. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Marie Tahon ◽  
Manon Macary ◽  
yannick Estève ◽  
Daniel Luzzati

<div> <div> <div> <p>The goal of our research is to automaticaly retrieve the satisfaction and the frustration in real-life call-center conversations. This study focuses an industrial application in which the customer satisfaction is continuously tracked down to improve customer services. To compensate the lack of large annotated emotional databases, we explore the use of pre-trained speech representations as a form of transfer learning towards AlloSat corpus. Moreover, several studies have pointed out that emotion can be detected not only in speech but also in facial trait, in biological response or in textual information. In the context of telephone conversations, we can break down the audio information into acoustic and linguistic by using the speech signal and its transcription. Our experiments confirms the large gain in performance obtained with the use of pre-trained features. Surprisingly, we found that the linguistic content is clearly the major contributor for the prediction of satisfaction and best generalizes to unseen data. Our experiments conclude to the definitive advantage of using CamemBERT representations, however the benefit of the fusion of acoustic and linguistic modalities is not as obvious. With models learnt on individual annotations, we found that fusion approaches are more robust to the subjectivity of the annotation task. This study also tackles the problem of performances variability and intends to estimate this variability from different views: weights initialization, confidence intervals and annotation subjectivity. A deep analysis on the linguistic content investigates interpretable factors able to explain the high contribution of the linguistic modality for this task. </p> </div> </div> </div>


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2794
Author(s):  
Mohammadreza Mirzaei ◽  
Peter Kán ◽  
Hannes Kaufmann

Sound source localization is important for spatial awareness and immersive Virtual Reality (VR) experiences. Deaf and Hard-of-Hearing (DHH) persons have limitations in completing sound-related VR tasks efficiently because they perceive audio information differently. This paper presents and evaluates a special haptic VR suit that helps DHH persons efficiently complete sound-related VR tasks. Our proposed VR suit receives sound information from the VR environment wirelessly and indicates the direction of the sound source to the DHH user by using vibrotactile feedback. Our study suggests that using different setups of the VR suit can significantly improve VR task completion times compared to not using a VR suit. Additionally, the results of mounting haptic devices on different positions of users’ bodies indicate that DHH users can complete a VR task significantly faster when two vibro-motors are mounted on their arms and ears compared to their thighs. Our quantitative and qualitative analysis demonstrates that DHH persons prefer using the system without the VR suit and prefer mounting vibro-motors in their ears. In an additional study, we did not find a significant difference in task completion time when using four vibro-motors with the VR suit compared to using only two vibro-motors in users’ ears without the VR suit.


Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2599
Author(s):  
Gabriela Santiago ◽  
Marvin Jiménez ◽  
Jose Aguilar ◽  
Edwin Montoya

The occupancy and activity estimation are fields that have been severally researched in the past few years. However, the different techniques used include a mixture of atmospheric features such as humidity and temperature, many devices such as cameras and audio sensors, or they are limited to speech recognition. In this work is proposed that the occupancy and activity can be estimated only from the audio information using an automatic approach of audio feature engineering to extract, analyze and select descriptors/variables. This scheme of extraction of audio descriptors is used to determine the occupation and activity in specific smart environments, such that our approach can differentiate between academic, administrative or commercial environments. Our approach from the audio feature engineering is compared to previous similar works on occupancy estimation and/or activity estimation in smart buildings (most of them including other features, such as atmospherics and visuals). In general, the results obtained are very encouraging compared to previous studies.


2021 ◽  
Vol 24 (7) ◽  
pp. 10-20
Author(s):  
Marco Olivieri ◽  
Raffaele Malvermi ◽  
Mirco Pezzoli ◽  
Massimiliano Zanoni ◽  
Sebastian Gonzalez ◽  
...  

Author(s):  
Jeena Augustine

Abstract: Emotions recognition from the speech is one of the foremost vital subdomains within the sphere of signal process. during this work, our system may be a two-stage approach, particularly feature extraction, and classification engine. Firstly, 2 sets of options square measure investigated that are: thirty-nine Mel-frequency Cepstral coefficients (MFCC) and sixty-five MFCC options extracted supported the work of [20]. Secondly, we've got a bent to use the Support Vector Machine (SVM) because the most classifier engine since it is the foremost common technique within the sector of speech recognition. Besides that, we've a tendency to research the importance of the recent advances in machine learning along with the deep kerne learning, further because the numerous types of auto-encoders (the basic auto-encoder and also the stacked autoencoder). an oversized set of experiments unit conducted on the SAVEE audio information. The experimental results show that the DSVM technique outperforms the standard SVM with a classification rate of sixty-nine. 84% and 68.25% victimization thirty-nine MFCC, severally. To boot, the auto encoder technique outperforms the standard SVM, yielding a classification rate of 73.01%. Keywords: Emotion recognition, MFCC, SVM, Deep Support Vector Machine, Basic auto-encoder, Stacked Auto encode


2021 ◽  
pp. 147807712110417
Author(s):  
Eray Şahbaz

In architectural education, generally experience is the most permanent way to learn professional skills. However, perhaps as a result of modern education, it is not possible to learn everything about the building by experience. SimYA project aims to help the development of a new generation construction studio based on learning by doing and experiencing in architectural education by using virtual reality (VR) technologies. Within the scope of the project, an interactive VR-based computer simulation (SimYA) developed to demonstrate basic construction elements such as foundations, walls, and roofs. The effectiveness of the SimYA against the traditional method has been tested with a scientific experiment. A total of 32 volunteer architecture students participated in the experiment. The participants were divided into two equal groups within the scope of the experiment. These groups were named as SimYA and control group in a way to be compatible with their related method. First, a pre-test was applied in order to evaluate the current knowledge of the groups. After that, the first group was taught with the VR supported SimYA program and the second group was taught with the traditional method. According to test results, the success rate of the SimYA group increased from 5.63% to 74.86%, and the success rate of the control group increased from 3.12% to 57.95%. These results indicate that the SimYA project has achieved the targeted success. It is thought that placing the building elements and materials in the virtual environment with student’s own hands and receiving visual and audio information from the interactive building elements are an important factor in this success.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5817
Author(s):  
Cecilia Provenzale ◽  
Nicola Di Stefano ◽  
Alessia Noccaro ◽  
Fabrizio Taffoni

Bowing is the fundamental motor action responsible for sound production in violin playing. A lot of effort is required to control such a complex technique, especially at the beginning of violin training, also due to a lack of quantitative assessments of bowing movements. Here, we present magneto-inertial measurement units (MIMUs) and an optical sensor interface for the real-time monitoring of the fundamental parameters of bowing. Two MIMUs and a sound recorder were used to estimate the bow orientation and acquire sounds. An optical motion capture system was used as the gold standard for comparison. Four optical sensors positioned on the bow stick measured the stick–hair distance. During a pilot test, a musician was asked to perform strokes using different sections of the bow at different paces. Distance data were used to train two classifiers, a linear discriminant (LD) classifier and a decision tree (DT) classifier, to estimate the bow section used. The DT classifier reached the best classification accuracy (94.2%). Larger data analysis on nine violin beginners showed that the orientation error was less than 2°; the bow tilt correlated with the audio information (r134=−0.973, 95% CI −0.981,−0.962,  p<0.001). The results confirmed that the interface provides reliable information on the bowing technique that might improve the learning performance of violin beginners.


Sign in / Sign up

Export Citation Format

Share Document