Automatic music transcription based on convolutional neural network, constant Q transform and MFCC

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work.

Download Full-text

A Parallel Fusion Approach to Piano Music Transcription Based on Convolutional Neural Network

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8461794 ◽

2018 ◽

Cited By ~ 1

Author(s):

Fu'ze Cong ◽

Shuchang Liu ◽

Li Guo ◽

Geraint A. Wiggins

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Piano Music ◽

Music Transcription ◽

Fusion Approach

Download Full-text

CONVOLUTIONAL NEURAL NETWORK BASED ALGORITHM FOR CECUM ACHIEVEMENT CONFIRMATION

10.1055/s-0040-1705059 ◽

2020 ◽

Author(s):

S Kashin ◽

D Zavyalov ◽

A Rusakov ◽

V Khryashchev ◽

A Lebedev

Keyword(s):

Neural Network ◽

Convolutional Neural Network

Download Full-text

No-Reference Utility Estimation with a Convolutional Neural Network

Electronic Imaging ◽

10.2352/issn.2470-1173.2018.09.iriacv-202 ◽

2018 ◽

Vol 2018 (9) ◽

pp. 202-1-202-6 ◽

Cited By ~ 2

Author(s):

Edward T. Scott ◽

Sheila S. Hemami

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Utility Estimation

Download Full-text

Non-Blind Image Deconvolution Based on “Ringing” Removal Using Convolutional Neural Network

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-180 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 181-1-181-7

Author(s):

Takahiro Kudo ◽

Takanori Fujisawa ◽

Takuro Yamaguchi ◽

Masaaki Ikehara

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Architecture ◽

Large Scale ◽

Blind Deconvolution ◽

Training Dataset ◽

Image Deconvolution ◽

Classic Problem ◽

Key Points ◽

Blind Image

Image deconvolution has been an important issue recently. It has two kinds of approaches: non-blind and blind. Non-blind deconvolution is a classic problem of image deblurring, which assumes that the PSF is known and does not change universally in space. Recently, Convolutional Neural Network (CNN) has been used for non-blind deconvolution. Though CNNs can deal with complex changes for unknown images, some CNN-based conventional methods can only handle small PSFs and does not consider the use of large PSFs in the real world. In this paper we propose a non-blind deconvolution framework based on a CNN that can remove large scale ringing in a deblurred image. Our method has three key points. The first is that our network architecture is able to preserve both large and small features in the image. The second is that the training dataset is created to preserve the details. The third is that we extend the images to minimize the effects of large ringing on the image borders. In our experiments, we used three kinds of large PSFs and were able to observe high-precision results from our method both quantitatively and qualitatively.

Download Full-text