A Divide and Conquer Approach to Automatic Music Transcription Using Neural Networks

Information Research ◽

Music Research ◽

Music Information

A very significant task for music research is to estimate instants when meaningful events begin (onset) and when they end (offset). Onset detection is widely applied in many fields: electrocardiograms, seismographic data, stock market results and many Music Information Research(MIR) tasks, such as Automatic Music Transcription, Rhythm Detection, Speech Recognition, etc. Automatic Onset Detection(AOD) received, recently, a huge contribution coming from Artificial Intelligence (AI) methods, mainly Machine Learning and Deep Learning. In this work, the use of Convolutional Neural Networks (CNN) is explored by adapting its original architecture in order to apply the approach to automatic onset detection on audio musical signals. We used a CNN network for onset detection on a very general dataset, well acknowledged by the MIR community, and examined the accuracy of the method by comparison to ground truth data published by the dataset. The results are promising and outperform another methods of musical onset detection.

A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription

Electronics ◽

10.3390/electronics10070810 ◽

2021 ◽

Vol 10 (7) ◽

pp. 810

Author(s):

Carlos Hernandez-Olivan ◽

Ignacio Zay Pinilla ◽

Carlos Hernandez-Lopez ◽

Jose R. Beltran

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

High Impact ◽

Critical Problem ◽

Music Transcription ◽

Music Information ◽

Method Show

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work.

Automatic Music Transcription with Convolutional Neural Networks using Intuitive Filter Shapes

10.15368/theses.2017.95 ◽

2017 ◽

Author(s):

Jonathan Sleep

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Music Transcription ◽

A Divide-and-Conquer Approach for Language Identification Based on Recurrent Neural Networks

10.21437/interspeech.2016-180 ◽

2016 ◽

Cited By ~ 11

Author(s):

G. Gelly ◽

Jean-Luc Gauvain ◽

V.B. Le ◽

A. Messaoudi

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Language Identification ◽

Divide And Conquer

Automatic Music Transcription: An Overview

IEEE Signal Processing Magazine ◽

10.1109/msp.2018.2869928 ◽

2019 ◽

Vol 36 (1) ◽

pp. 20-30 ◽

Cited By ~ 19

Author(s):

Emmanouil Benetos ◽

Simon Dixon ◽

Zhiyao Duan ◽

Sebastian Ewert

Keyword(s):

Music Transcription ◽

Chord-aware automatic music transcription based on hierarchical Bayesian integration of acoustic and language models

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2018.17 ◽

2018 ◽

Vol 7 ◽

Author(s):

Yuta Ojima ◽

Eita Nakamura ◽

Katsutoshi Itoyama ◽

Kazuyoshi Yoshii

Keyword(s):

Latent Variables ◽

Language Model ◽

Language Models ◽

Sequential Dependency ◽

Acoustic Model ◽

Hierarchical Bayesian ◽

Generative Process ◽

Music Transcription ◽

Music Audio

This paper describes automatic music transcription with chord estimation for music audio signals. We focus on the fact that concurrent structures of musical notes such as chords form the basis of harmony and are considered for music composition. Since chords and musical notes are deeply linked with each other, we propose joint pitch and chord estimation based on a Bayesian hierarchical model that consists of an acoustic model representing the generative process of a spectrogram and a language model representing the generative process of a piano roll. The acoustic model is formulated as a variant of non-negative matrix factorization that has binary variables indicating a piano roll. The language model is formulated as a hidden Markov model that has chord labels as the latent variables and emits a piano roll. The sequential dependency of a piano roll can be represented in the language model. Both models are integrated through a piano roll in a hierarchical Bayesian manner. All the latent variables and parameters are estimated using Gibbs sampling. The experimental results showed the great potential of the proposed method for unified music transcription and grammar induction.

Automatic music transcription supporting different instruments

Proceedings Third International Conference on WEB Delivering of Music ◽

10.1109/wdm.2003.1233871 ◽

2004 ◽

Cited By ~ 1

Author(s):

I. Bruno ◽

S.L. Monni ◽

P. Nesi

Keyword(s):

Music Transcription ◽

Multi-Instrument Automatic Music Transcription With Self-Attention-Based Instance Segmentation

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2020.3030482 ◽

2020 ◽

Vol 28 ◽

pp. 2796-2809

Author(s):

Yu-Te Wu ◽

Berlin Chen ◽

Li Su

Keyword(s):

Music Transcription ◽

Instance Segmentation

Automatic music transcription based on convolutional neural network, constant Q transform and MFCC

Journal of Physics Conference Series ◽

10.1088/1742-6596/1651/1/012192 ◽

2020 ◽

Vol 1651 ◽

pp. 012192

Author(s):

Zhihang Meng ◽

Wencheng Chen

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Music Transcription ◽

A Shift-Invariant Latent Variable Model for Automatic Music Transcription

Computer Music Journal ◽

10.1162/comj_a_00146 ◽

2012 ◽

Vol 36 (4) ◽

pp. 81-94 ◽

Cited By ~ 36

Author(s):

Emmanouil Benetos ◽

Simon Dixon

Keyword(s):

Latent Variable ◽

Markov Models ◽

Variable Model ◽

Data Set ◽

Music Transcription ◽

Transcription System ◽

Error Metrics ◽

Frequency Modulations ◽

Time Varying Pitch

In this work, a probabilistic model for multiple-instrument automatic music transcription is proposed. The model extends the shift-invariant probabilistic latent component analysis method, which is used for spectrogram factorization. Proposed extensions support the use of multiple spectral templates per pitch and per instrument source, as well as a time-varying pitch contribution for each source. Thus, this method can effectively be used for multiple-instrument automatic transcription. In addition, the shift-invariant aspect of the method can be exploited for detecting tuning changes and frequency modulations, as well as for visualizing pitch content. For note tracking and smoothing, pitch-wise hidden Markov models are used. For training, pitch templates from eight orchestral instruments were extracted, covering their complete note range. The transcription system was tested on multiple-instrument polyphonic recordings from the RWC database, a Disklavier data set, and the MIREX 2007 multi-F0 data set. Results demonstrate that the proposed method outperforms leading approaches from the transcription literature, using several error metrics.