Towards end-to-end polyphonic music transcription: Transforming music audio directly to a score

Author(s):  
Ralf Gunter Correa Carvalho ◽  
Paris Smaragdis
Author(s):  
Yuta Ojima ◽  
Eita Nakamura ◽  
Katsutoshi Itoyama ◽  
Kazuyoshi Yoshii

This paper describes automatic music transcription with chord estimation for music audio signals. We focus on the fact that concurrent structures of musical notes such as chords form the basis of harmony and are considered for music composition. Since chords and musical notes are deeply linked with each other, we propose joint pitch and chord estimation based on a Bayesian hierarchical model that consists of an acoustic model representing the generative process of a spectrogram and a language model representing the generative process of a piano roll. The acoustic model is formulated as a variant of non-negative matrix factorization that has binary variables indicating a piano roll. The language model is formulated as a hidden Markov model that has chord labels as the latent variables and emits a piano roll. The sequential dependency of a piano roll can be represented in the language model. Both models are integrated through a piano roll in a hierarchical Bayesian manner. All the latent variables and parameters are estimated using Gibbs sampling. The experimental results showed the great potential of the proposed method for unified music transcription and grammar induction.


Sign in / Sign up

Export Citation Format

Share Document