scholarly journals Audio-Visual Tensor Fusion Network for Piano Player Posture Classification

2020 ◽  
Vol 10 (19) ◽  
pp. 6857
Author(s):  
So-Hyun Park ◽  
Young-Ho Park

Playing the piano in the correct position is important because the correct position helps to produce good sound and prevents injuries. Many studies have been conducted in the field of piano playing posture recognition that combines various techniques. Most of these techniques are based on analyzing visual information. However, in the piano education field, it is essential to utilize audio information in addition to visual information due to the deep relationship between posture and sound. In this paper, we propose an audio-visual tensor fusion network (simply, AV-TFN) for piano performance posture classification. Unlike existing studies that used only visual information, the proposed method uses audio information to improve the accuracy in classifying the postures of professional and amateur pianists. For this, we first introduce a dataset called C3Pap (Classic piano performance postures of amateur and professionals) that contains actual piano performance videos in diverse environments. Furthermore, we propose a data structure that represents audio-visual information. The proposed data structure represents audio information on the color scale and visual information on the black and white scale for representing relativeness between them. We call this data structure an audio-visual tensor. Finally, we compare the performance of the proposed method with state-of-the-art approaches: VN (Visual Network), AN (Audio Network), AVN (Audio-Visual Network) with concatenation and attention techniques. The experiment results demonstrate that AV-TFN outperforms existing studies and, thus, can be effectively used in the classification of piano playing postures.

Author(s):  
Weiyu Zhang ◽  
Se-Hoon Jeong ◽  
Martin Fishbein†

This study investigates how multitasking interacts with levels of sexually explicit content to influence an individual’s ability to recognize TV content. A 2 (multitasking vs. nonmultitasking) by 3 (low, medium, and high sexual content) between-subjects experiment was conducted. The analyses revealed that multitasking not only impaired task performance, but also decreased TV recognition. An inverted-U relationship between degree of sexually explicit content and recognition of TV content was found, but only when subjects were multitasking. In addition, multitasking interfered with subjects’ ability to recognize audio information more than their ability to recognize visual information.


2008 ◽  
Vol 2 (2) ◽  
Author(s):  
Glenn Nordehn ◽  
Spencer Strunic ◽  
Tom Soldner ◽  
Nicholas Karlisch ◽  
Ian Kramer ◽  
...  

Introduction: Cardiac auscultation accuracy is poor: 20% to 40%. Audio-only of 500 heart sounds cycles over a short time period significantly improved auscultation scores. Hypothesis: adding visual information to an audio-only format, significantly (p<.05) improves short and long term accuracy. Methods: Pre-test: Twenty-two 1st and 2nd year medical student participants took an audio-only pre-test. Seven students comprising our audio-only training cohort heard audio-only, of 500 heart sound repetitions. 15 students comprising our paired visual with audio cohort heard and simultaneously watched video spectrograms of the heart sounds. Immediately after trainings, both cohorts took audio-only post-tests; the visual with audio cohort also took a visual with audio post-test, a test providing audio with simultaneous video spectrograms. All tests were repeated in six months. Results: All tests given immediately after trainings showed significant improvement with no significant difference between the cohorts. Six months later neither cohorts maintained significant improvement on audio-only post-tests. Six months later the visual with audio cohort maintained significant improvement (p<.05) on the visual with audio post-test. Conclusions: Audio retention of heart sound recognition is not maintained if: trained using audio-only; or, trained using visual with audio. Providing visual with audio in training and testing allows retention of auscultation accuracy. Devices providing visual information during auscultation could prove beneficial.


2019 ◽  
Vol 32 (2) ◽  
pp. 87-109 ◽  
Author(s):  
Galit Buchs ◽  
Benedetta Heimler ◽  
Amir Amedi

Abstract Visual-to-auditory Sensory Substitution Devices (SSDs) are a family of non-invasive devices for visual rehabilitation aiming at conveying whole-scene visual information through the intact auditory modality. Although proven effective in lab environments, the use of SSDs has yet to be systematically tested in real-life situations. To start filling this gap, in the present work we tested the ability of expert SSD users to filter out irrelevant background noise while focusing on the relevant audio information. Specifically, nine blind expert users of the EyeMusic visual-to-auditory SSD performed a series of identification tasks via SSDs (i.e., shape, color, and conjunction of the two features). Their performance was compared in two separate conditions: silent baseline, and with irrelevant background sounds from real-life situations, using the same stimuli in a pseudo-random balanced design. Although the participants described the background noise as disturbing, no significant performance differences emerged between the two conditions (i.e., noisy; silent) for any of the tasks. In the conjunction task (shape and color) we found a non-significant trend for a disturbing effect of the background noise on performance. These findings suggest that visual-to-auditory SSDs can indeed be successfully used in noisy environments and that users can still focus on relevant auditory information while inhibiting irrelevant sounds. Our findings take a step towards the actual use of SSDs in real-life situations while potentially impacting rehabilitation of sensory deprived individuals.


2020 ◽  
Vol 15 (1) ◽  
pp. 112-126
Author(s):  
Hakan Bagci

The primary problem of this study is to determine whether there is a significant relationship between the attitudes towards harmony courses and the piano playing habits of the students. In this study, a correlational survey model was employed. The population of this study consisted of students who are studying at music departments in Turkey during the academic year of 2019–2020 and the sample included 248 students from nine different universities and four different departments related to music (Music Education, Performance, Musicology and Turkish Music). For data collection purposes, the scale of attitudes towards harmony courses developed, the scale of piano playing habits developed and a questionnaire to determine the variables affecting students’ habits and attitudes developed by the researcher were used. There is no significant difference found between the students’ departments and their piano playing habits. The study revealed that students’ piano playing habits varied according to their personal instruments. Keywords: Attitudes, harmony education, music education, music theory, piano education.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Ying Zhu

Piano performance is an art with rich artistic elements and unpredictable performance skills. It is an important carrier for playing beautiful piano sounds. The generation of musical tension and expression of piano performance is a vivid display of piano performance skills. In piano performance, we should pay attention to the cultivation and flexible application of performance skills. In order to ensure the richness and artistry of piano performance, it is fully based on the artistic characteristics of piano performance. Through in-depth analysis of the principle of the hidden Markov model, it is applied to the multimedia recognition process of piano playing music. In the process of obtaining the template, the fundamental frequency of the piano playing music differs greatly, and the piano playing music appears during the performance process. For the problem of low recognition rate, this paper proposes a multimedia recognition method for piano music. Finally, the analysis of experimental results shows that the method proposed in this paper has a 16% higher recognition rate than the traditional method, and it has a certain value in the multimedia recognition of piano music.


2020 ◽  
Vol 12 (1) ◽  
pp. 51-59
Author(s):  
A. A. Moskvin ◽  
A.G. Shishkin

Human emotions play significant role in everyday life. There are a lot of applications of automatic emotion recognition in medicine, e-learning, monitoring, marketing etc. In this paper the method and neural network architecture for real-time human emotion recognition by audio-visual data are proposed. To classify one of seven emotions, deep neural networks, namely, convolutional and recurrent neural networks are used. Visual information is represented by a sequence of 16 frames of 96 × 96 pixels, and audio information - by 140 features for each of a sequence of 37 temporal windows. To reduce the number of audio features autoencoder was used. Audio information in conjunction with visual one is shown to increase recognition accuracy up to 12%. The developed system being not demanding to be computing resources is dynamic in terms of selection of parameters, reducing or increasing the number of emotion classes, as well as the ability to easily add, accumulate and use information from other external devices for further improvement of classification accuracy. 


2021 ◽  
Vol 7 (8) ◽  
pp. 135
Author(s):  
Davide Dal Cortivo ◽  
Sara Mandelli ◽  
Paolo Bestagini ◽  
Stefano Tubaro

Identifying the source camera of images and videos has gained significant importance in multimedia forensics. It allows tracing back data to their creator, thus enabling to solve copyright infringement cases and expose the authors of hideous crimes. In this paper, we focus on the problem of camera model identification for video sequences, that is, given a video under analysis, detecting the camera model used for its acquisition. To this purpose, we develop two different CNN-based camera model identification methods, working in a novel multi-modal scenario. Differently from mono-modal methods, which use only the visual or audio information from the investigated video to tackle the identification task, the proposed multi-modal methods jointly exploit audio and visual information. We test our proposed methodologies on the well-known Vision dataset, which collects almost 2000 video sequences belonging to different devices. Experiments are performed, considering native videos directly acquired by their acquisition devices and videos uploaded on social media platforms, such as YouTube and WhatsApp. The achieved results show that the proposed multi-modal approaches significantly outperform their mono-modal counterparts, representing a valuable strategy for the tackled problem and opening future research to even more challenging scenarios.


Author(s):  
Meichen Liu ◽  
Jieru Huang

In recent years, with the rise of piano teaching, many people began to learn to play the piano. However, the expensive piano teaching cost and its unique teaching model that teachers and students are one to one have caused the shortage of piano education resources, and people learn piano playing has become a luxury activity. The use of computer multimedia software for piano teaching has become a feasible way to alleviate this contradiction. This paper proposes the design of an intelligent piano playing teaching system based on neural network, studies the realization method of the piano teaching system, presents a method of evaluating piano playing by using neural network model for the difficulties in computer piano teaching, that is, computer teaching is one-way knowledge transfer without interaction. In addition, this paper simulates the teacher to guide the students to carry on the playing practice, which is of great significance to the teaching of the piano.


2016 ◽  
Vol 28 (1) ◽  
pp. 41-54 ◽  
Author(s):  
Roberta Bianco ◽  
Giacomo Novembre ◽  
Peter E. Keller ◽  
Florian Scharf ◽  
Angela D. Friederici ◽  
...  

Complex human behavior is hierarchically organized. Whether or not syntax plays a role in this organization is currently under debate. The present ERP study uses piano performance to isolate syntactic operations in action planning and to demonstrate their priority over nonsyntactic levels of movement selection. Expert pianists were asked to execute chord progressions on a mute keyboard by copying the posture of a performing model hand shown in sequences of photos. We manipulated the final chord of each sequence in terms of Syntax (congruent/incongruent keys) and Manner (conventional/unconventional fingering), as well as the strength of its predictability by varying the length of the Context (five-chord/two-chord progressions). The production of syntactically incongruent compared to congruent chords showed a response delay that was larger in the long compared to the short context. This behavioral effect was accompanied by a centroparietal negativity in the long but not in the short context, suggesting that a syntax-based motor plan was prepared ahead. Conversely, the execution of the unconventional manner was not delayed as a function of Context and elicited an opposite electrophysiological pattern (a posterior positivity). The current data support the hypothesis that motor plans operate at the level of musical syntax and are incrementally translated to lower levels of movement selection.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Haoze Chen ◽  
Zhijie Zhang

AbstractDue to the audio information of different types of vehicle models are distinct, the vehicle information can be identified by the audio signal of vehicle accurately. In real life, in order to determine the type of vehicle, we do not need to obtain the visual information of vehicles and just need to obtain the audio information. In this paper, we extract and stitching different features from different aspects: Mel frequency cepstrum coefficients in perceptual characteristics, pitch class profile in psychoacoustic characteristics and short-term energy in acoustic characteristics. In addition, we improve the neural networks classifier by fusing the LSTM unit into the convolutional neural networks. At last, we put the novel feature to the hybrid neural networks to recognize different vehicles. The results suggest the novel feature we proposed in this paper can increase the recognition rate by 7%; destroying the training data randomly by superimposing different kinds of noise can improve the anti-noise ability in our identification system; and LSTM has great advantages in modeling time series, adding LSTM to the networks can improve the recognition rate of 3.39%.


Sign in / Sign up

Export Citation Format

Share Document