PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011174 ◽

2019 ◽

Vol 33 ◽

pp. 1174-1181

Author(s):

Bryan Wang ◽

Yi-Hsuan Yang

Keyword(s):

Speech Synthesis ◽

User Study ◽

Initial Result ◽

Musical Score ◽

Residual Network ◽

Musical Scores ◽

Music Generation ◽

Automatic Music Generation ◽

Music Audio ◽

Multi Band

Music creation is typically composed of two parts: composing the musical score, and then performing the score with instruments to make sounds. While recent work has made much progress in automatic music generation in the symbolic domain, few attempts have been made to build an AI model that can render realistic music audio from musical scores. Directly synthesizing audio with sound sample libraries often leads to mechanical and deadpan results, since musical scores do not contain performance-level information, such as subtle changes in timing and dynamics. Moreover, while the task may sound like a text-to-speech synthesis problem, there are fundamental differences since music audio has rich polyphonic sounds. To build such an AI performer, we propose in this paper a deep convolutional model that learns in an end-to-end manner the score-to-audio mapping between a symbolic representation of music called the pianorolls and an audio representation of music called the spectrograms. The model consists of two subnets: the ContourNet, which uses a U-Net structure to learn the correspondence between pianorolls and spectrograms and to give an initial result; and the TextureNet, which further uses a multi-band residual network to refine the result by adding the spectral texture of overtones and timbre. We train the model to generate music clips of the violin, cello, and flute, with a dataset of moderate size. We also present the result of a user study that shows our model achieves higher mean opinion score (MOS) in naturalness and emotional expressivity than a WaveNet-based model and two off-the-shelf synthesizers. We open our source code at https://github.com/bwang514/PerformanceNet

Download Full-text

Automatic Music Generation

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i3.8082 ◽

2019 ◽

Vol 7 (3) ◽

pp. 80-82

Author(s):

Lawakesh Patel ◽

Nidhi Singh ◽

Rizwan Khan

Keyword(s):

Music Generation ◽

Automatic Music Generation

Download Full-text

Time-Frequency Representations

Handbook of Fourier Analysis & Its Applications ◽

10.1093/oso/9780195335927.003.0014 ◽

2009 ◽

Author(s):

Robert J Marks II

Keyword(s):

International Standards ◽

Musical Score ◽

Instantaneous Spectrum ◽

Spectral Content ◽

Time Frequency ◽

Frequency Representation ◽

Musical Scores ◽

Vertical Location ◽

Technical Theory ◽

The Fourier Transform

The Fourier transform is not particularly conducive in the illustration of the evolution of frequency with respect to time. A representation of the temporal evolution of the spectral content of a signal is referred to as a time-frequency representation (TFR). The TFR, in essence, attempts to measure the instantaneous spectrum of a dynamic signal at each point in time. Musical scores, in their most fundamental interpretation, are TFR’s. The fundamental frequency of the note is represented by the vertical location of the note on the staff. Time progresses as we read notes from left to right. The musical score shown in Figure 9.1 is an example. Temporal assignment is given by the note types. The 120 next to the quarter note indicates the piece should be played at 120 beats per minute. Thus, the duration of a quarter note is one half second. The frequency of the A above middle C is, by international standards, 440 Hertz. Adjacent notes notes have a ratio of 21/12. The note, A#, for example, has a frequency of 440 × 21/12 = 466.1637615 Hertz. Middle C, nine half tones (a.k.a. semitones or chromatic steps) below A, has a frequency of 440 × 2−9/12 = 261.6255653 Hertz. The interval of an octave doubles the frequency. The frequency of an octave above A is twelve half tones, or, 440 × 212/12 = 880 Hertz. The frequency spacings in the time-frequency representation of musical scores such as Figure 9.1 are thus logarithmic. This is made more clear in the alternate representation of the musical score in Figure 9.2 where time is on the horizontal axis and frequency on the vertical. At every point in time where there is no rest, a frequency is assigned. To make chords, numerous frequencies can be assigned to a point in time. Further discussion of the technical theory of western harmony is in Section 13.1.

Download Full-text

Composer4Everyone: Automatic Music Generation with Audio Motif

2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) ◽

10.1109/mipr.2019.00101 ◽

2019 ◽

Cited By ~ 1

Author(s):

Aozhi Liu ◽

Jianzong Wang ◽

Junqing Peng ◽

Yiwen Wang ◽

Yaqi Mei ◽

...

Keyword(s):

Music Generation ◽

Automatic Music Generation

Download Full-text

A Multilayered Approach to Automatic Music Generation and Expressive Performance

2019 International Workshop on Multilayer Music Representation and Processing (MMRP) ◽

10.1109/mmrp.2019.00016 ◽

2019 ◽

Cited By ~ 5

Author(s):

Filippo Carnovalini ◽

Antonio Roda

Keyword(s):

Expressive Performance ◽

Music Generation ◽

Automatic Music Generation

Download Full-text

Security and Ethical Concerns of Affective Algorithmic Music Composition in Smart Spaces

Modern Theories and Practices for Cyber Ethics and Security Compliance - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-3149-5.ch012 ◽

2020 ◽

pp. 193-203

Author(s):

Abigail Wiafe ◽

Pasi Fränti

Keyword(s):

Ethical Issues ◽

Denial Of Service ◽

Algorithmic Composition ◽

Smart Spaces ◽

Ethical Implications ◽

Algorithmic Music ◽

Music Generation ◽

Automatic Music Generation ◽

Algorithmic Music Composition ◽

Emotionally Intelligent

Affective algorithmic composition systems are emotionally intelligent automatic music generation systems that explore the current emotions or mood of a listener and compose an affective music to alter the person's mood to a predetermined one. The fusion of affective algorithmic composition systems and smart spaces have been identified to be beneficial. For instance, studies have shown that they can be used for therapeutic purposes. Amidst these benefits, research on its related security and ethical issues is lacking. This chapter therefore seeks to provoke discussion on security and ethical implications of using affective algorithmic compositions systems in smart spaces. It presents issues such as impersonation, eavesdropping, data tempering, malicious codes, and denial-of-service attacks associated with affective algorithmic composition systems. It also discusses some ethical implications relating to intensions, harm, and possible conflicts that users of such systems may experience.

Download Full-text

A user study on the influence of mobile device class, synthesis method, data rate and lexicon on speech synthesis quality

10.21437/interspeech.2005-780 ◽

2005 ◽

Author(s):

Michael Pucher ◽

Peter Fröhlich

Keyword(s):

Mobile Device ◽

Speech Synthesis ◽

User Study ◽

Synthesis Method ◽

Data Rate

Download Full-text

AN OPTICAL MUSIC RECOGNITION SYSTEM FOR SKEW OR INVERTED MUSICAL SCORES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001413530054 ◽

2013 ◽

Vol 27 (07) ◽

pp. 1353005 ◽

Cited By ~ 2

Author(s):

YUNG-SHENG CHEN ◽

FENG-SHENG CHEN ◽

CHIN-HUNG TENG

Keyword(s):

Image Processing ◽

Recognition Accuracy ◽

Recognition System ◽

Pop Music ◽

Musical Score ◽

Optical Music Recognition ◽

Pattern Recognition Techniques ◽

Musical Scores ◽

Music Recognition ◽

Series Of Experiments

Optical Music Recognition (OMR) is a technique for converting printed musical documents into computer readable formats. In this paper, we present a simple OMR system that can perform well for ordinary musical documents such as ballad and pop music. This system is constructed based on fundamental image processing and pattern recognition techniques, thus it is easy to implement. Moreover, this system has a strong capability in skew restoration and inverted musical score detection. From a series of experiments, the error for our skew restoration is below 0.2° for any possible document rotation and the accuracy of inverted musical score detection is up to 98.89%. The overall recognition accuracy of our OMR can achieve to nearly 97%, a figure comparable with current commercial OMR software. However, if taking into image skew into consideration, our system is superior to commercial software in terms of recognition accuracy.

Download Full-text

Music Generation Based on Convolution-LSTM

Computer and Information Science ◽

10.5539/cis.v11n3p50 ◽

2018 ◽

Vol 11 (3) ◽

pp. 50 ◽

Cited By ~ 2

Author(s):

Yongjie Huang ◽

Xiaofeng Huang ◽

Qiakai Cai

Keyword(s):

Short Term Memory ◽

Time Domain Analysis ◽

Domain Analysis ◽

Frequency Domain Analysis ◽

Musical Score ◽

Short Term ◽

Music Generation ◽

Long Short Term Memory ◽

Music File ◽

Score Matrix

In this paper, we propose a model that combines Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) for music generation. We first convert MIDI-format music file into a musical score matrix, and then establish convolution layers to extract feature of the musical score matrix. Finally, the output of the convolution layers is split in the direction of the time axis and input into the LSTM, so as to achieve the purpose of music generation. The result of the model was verified by comparison of accuracy, time-domain analysis, frequency-domain analysis and human-auditory evaluation. The results show that Convolution-LSTM performs better in music genertaion than LSTM, with more pronounced undulations and clearer melody.

Download Full-text

INCO-GAN: Variable-Length Music Generation Method Based on Inception Model-Based Conditional GAN

Mathematics ◽

10.3390/math9040387 ◽

2021 ◽

Vol 9 (4) ◽

pp. 387

Author(s):

Shuyu Li ◽

Yunsick Sung

Keyword(s):

Neural Network ◽

Deep Learning ◽

Generative Models ◽

Variable Length ◽

Sequential Data ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Music Generation ◽

Automatic Music Generation ◽

Frequency Vectors

Deep learning has made significant progress in the field of automatic music generation. At present, the research on music generation via deep learning can be divided into two categories: predictive models and generative models. However, both categories have the same problems that need to be resolved. First, the length of the music must be determined artificially prior to generation. Second, although the convolutional neural network (CNN) is unexpectedly superior to the recurrent neural network (RNN), CNN still has several disadvantages. This paper proposes a conditional generative adversarial network approach using an inception model (INCO-GAN), which enables the generation of complete variable-length music automatically. By adding a time distribution layer that considers sequential data, CNN considers the time relationship in a manner similar to RNN. In addition, the inception model obtains richer features, which improves the quality of the generated music. In experiments conducted, the music generated by the proposed method and that by human composers were compared. High cosine similarity of up to 0.987 was achieved between the frequency vectors, indicating that the music generated by the proposed method is very similar to that created by a human composer.

Download Full-text

A Review of Automatic Music Generation Based on Performance RNN

10.36227/techrxiv.12088980.v1 ◽

2020 ◽

Author(s):

Jiyanbo Cao ◽

Jinan Fiaidhi ◽

Maolin Qi

Keyword(s):

Neural Network ◽

Deep Learning ◽

Recurrent Neural Network ◽

Network Structure ◽

Network Performance ◽

State Of The Art ◽

Learning Techniques ◽

History Of ◽

Music Generation ◽

Automatic Music Generation

This paper has reviewed the deep learning techniques which used in music generation. The research was based on <i>Sageev Oore's</i> proposed LSTM based recurrent neural network (Performance RNN). We have study the history of automatic music generation, and now we are using a state of the art techniques to achieve this mission. We have conclude the process of making a MIDI file to a structure as input of Performance RNN and the network structure of it.

Download Full-text