Multimodal image and audio music transcription

AbstractOptical Music Recognition (OMR) and Automatic Music Transcription (AMT) stand for the research fields that aim at obtaining a structured digital representation from sheet music images and acoustic recordings, respectively. While these fields have traditionally evolved independently, the fact that both tasks may share the same output representation poses the question of whether they could be combined in a synergistic manner to exploit the individual transcription advantages depicted by each modality. To evaluate this hypothesis, this paper presents a multimodal framework that combines the predictions from two neural end-to-end OMR and AMT systems by considering a local alignment approach. We assess several experimental scenarios with monophonic music pieces to evaluate our approach under different conditions of the individual transcription systems. In general, the multimodal framework clearly outperforms the single recognition modalities, attaining a relative improvement close to $$40\%$$ 40 % in the best case. Our initial premise is, therefore, validated, thus opening avenues for further research in multimodal OMR-AMT transcription.

Download Full-text

Optical music recognition for traditional Thai sheet music

2014 International Computer Science and Engineering Conference (ICSEC) ◽

10.1109/icsec.2014.6978187 ◽

2014 ◽

Cited By ~ 2

Author(s):

Worapan Kusakunniran ◽

Attapol Prempanichnukul ◽

Arthid Maneesutham ◽

Kullachut Chocksawud ◽

Suparus Tongsamui ◽

...

Keyword(s):

Sheet Music ◽

Optical Music Recognition ◽

Music Recognition

Download Full-text

White Mensural Manual Encoding: from Humdrum to MEI

Cuadernos de Investigación Musical ◽

10.18239/invesmusic.v0i6.1953 ◽

2019 ◽

pp. 373

Author(s):

David Rizo Valero ◽

Nieves Pascual León ◽

Craig Stuart Sapp

Keyword(s):

Computer Code ◽

Sheet Music ◽

Optical Music Recognition ◽

Music Recognition ◽

Automated Tools ◽

Musical Heritage

<p><span lang="EN-US">The recovery of musical heritage currently necessarily involves its digitalization, not only by scanning images, but also by the encoding in computer-readable formats of the musical content described in the original manuscripts. In general, this encoding can be done using automated tools based with what is named Optical Music Recognition (OMR), or manually writing directly the corresponding computer code. The OMR technology is not mature enough yet to extract the musical content of sheet music images with enough quality, and even less from handwritten sources, so in many cases it is more efficient to encode the works manually. However, being currently MEI (Music Encoding Initiative) the most appropriate format to store the encoding, it is a totally tedious code to be manually written. Therefore, we propose a new format named **mens allowing a quick manual encoding, from which both the MEI format itself and other common representations such as Lilypond or the transcription in MusicXML can be generated. By using this approach, the antiphony Salve Regina for eight-voice choir written by Jerónimo de la Torre (1607–1673) has been successfully encoded and transcribed.</span></p>

Download Full-text

Pengantar dan Survey Tentang Optical Music Recognition

Jurnal ULTIMATICS ◽

10.31937/ti.v6i1.331 ◽

2014 ◽

Vol 6 (1) ◽

pp. 36-39

Author(s):

Kevin Purwito

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Music Recognition ◽

Optical Character ◽

Digital Format ◽

Music Recognition ◽

Index Terms ◽

The Many ◽

Further Development ◽

Music Symbol

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer

Download Full-text

Understanding Optical Music Recognition

ACM Computing Surveys ◽

10.1145/3397499 ◽

2020 ◽

Vol 53 (4) ◽

pp. 1-35 ◽

Cited By ~ 1

Author(s):

Jorge Calvo-Zaragoza ◽

Jan Hajič Jr. ◽

Alexander Pacha

Keyword(s):

Optical Music Recognition ◽

Music Recognition

Download Full-text

Automatic Music Transcription: An Overview

IEEE Signal Processing Magazine ◽

10.1109/msp.2018.2869928 ◽

2019 ◽

Vol 36 (1) ◽

pp. 20-30 ◽

Cited By ~ 19

Author(s):

Emmanouil Benetos ◽

Simon Dixon ◽

Zhiyao Duan ◽

Sebastian Ewert

Keyword(s):

Music Transcription ◽

Automatic Music Transcription

Download Full-text

Chord-aware automatic music transcription based on hierarchical Bayesian integration of acoustic and language models

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2018.17 ◽

2018 ◽

Vol 7 ◽

Author(s):

Yuta Ojima ◽

Eita Nakamura ◽

Katsutoshi Itoyama ◽

Kazuyoshi Yoshii

Keyword(s):

Latent Variables ◽

Language Model ◽

Language Models ◽

Sequential Dependency ◽

Acoustic Model ◽

Hierarchical Bayesian ◽

Generative Process ◽

Music Transcription ◽

Automatic Music Transcription ◽

Music Audio

This paper describes automatic music transcription with chord estimation for music audio signals. We focus on the fact that concurrent structures of musical notes such as chords form the basis of harmony and are considered for music composition. Since chords and musical notes are deeply linked with each other, we propose joint pitch and chord estimation based on a Bayesian hierarchical model that consists of an acoustic model representing the generative process of a spectrogram and a language model representing the generative process of a piano roll. The acoustic model is formulated as a variant of non-negative matrix factorization that has binary variables indicating a piano roll. The language model is formulated as a hidden Markov model that has chord labels as the latent variables and emits a piano roll. The sequential dependency of a piano roll can be represented in the language model. Both models are integrated through a piano roll in a hierarchical Bayesian manner. All the latent variables and parameters are estimated using Gibbs sampling. The experimental results showed the great potential of the proposed method for unified music transcription and grammar induction.

Download Full-text

Automatic music transcription supporting different instruments

Proceedings Third International Conference on WEB Delivering of Music ◽

10.1109/wdm.2003.1233871 ◽

2004 ◽

Cited By ~ 1

Author(s):

I. Bruno ◽

S.L. Monni ◽

P. Nesi

Keyword(s):

Music Transcription ◽

Automatic Music Transcription

Download Full-text

Discussion Group Summary: Optical Music Recognition

Lecture Notes in Computer Science - Graphics Recognition. Current Trends and Evolutions ◽

10.1007/978-3-030-02284-6_12 ◽

2018 ◽

pp. 152-157 ◽

Cited By ~ 1

Author(s):

Jorge Calvo-Zaragoza ◽

Jan Hajič ◽

Alexander Pacha

Keyword(s):

Discussion Group ◽

Optical Music Recognition ◽

Music Recognition

Download Full-text

A Divide and Conquer Approach to Automatic Music Transcription Using Neural Networks

Progress in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-30244-3_19 ◽

2019 ◽

pp. 220-231

Author(s):

André Gil ◽

Carlos Grilo ◽

Gustavo Reis ◽

Patrício Domingues

Keyword(s):

Neural Networks ◽

Divide And Conquer ◽

Music Transcription ◽

Automatic Music Transcription

Download Full-text

Nostalgic transmediation: A not-so-final fantasy? Ichigo’s Sheet Music online platform as an object network of creative practice

Australasian Journal of Popular Culture ◽

10.1386/ajpc_00026_1 ◽

2020 ◽

Vol 9 (2) ◽

pp. 179-193

Author(s):

Ruth Barratt-Peacock ◽

Sophia Staite

Keyword(s):

Popular Culture ◽

Social Practice ◽

Sheet Music ◽

Online Platform ◽

Creative Practice ◽

Music Transcription ◽

The Past ◽

Social Nature

Using the music of the Final Fantasy game series as our case study, we follow the music through processes of transmediation in two very different contexts: the Netflix series Dad of Light and music transcription forum Ichigo’s Sheet Music. We argue that these examples reveal transmediation acting as a process of ‘emptying’, allowing the music to carry its nostalgic cargo of affect into new relationships and contexts. This study’s theoretical combination of transmediation with Bainbridge’s object networks of social practice frame challenges normative definitions of nostalgia. The phenomenon of ‘emptying’ we identify reveals a function of popular culture nostalgia that differs from the dominant understanding as a triggering of generalized emotional longing for (or the desire to return to) the past. Instead, this article uncovers a nostalgia that is defined by personal and communal creative engagement and highlights the active and social nature of transmediated popular culture nostalgia.

Download Full-text