metadata

tags:
  - gpt2
  - text-generation
  - music-modeling
  - music-generation
widget:
  - text: PIECE_START
  - text: >-
      PIECE_START STYLE=JSFAKES GENRE=JSFAKES TRACK_START INST=48 BAR_START
      NOTE_ON=60
  - text: >-
      PIECE_START STYLE=JSFAKES GENRE=JSFAKES TRACK_START INST=48 BAR_START
      NOTE_ON=58

GPT-2 for Music

Language Models such as GPT-2 can be used for Music Generation. The idea is to represent pieces of music as texts, effectively reducing the task to Language Generation.

This model is a rather small instance of GPT-2 trained on TristanBehrens/js-fakes-4bars. The model generates 4 bars at a time of Bach-like chorales with four voices (soprano, alto, tenor, bass).

If you are contribute, if you want to say hello, if you want to know more, find me on LinkedIn

Model description

The model is GPT-2 with 6 decoders and 8 attention-heads each. The context length is 512. The embedding dimensions are 512 as well. The vocabulary size is 119.

Intended uses & limitations

This model is just a proof of concept. It shows that HuggingFace can be used to compose music.

How to use

You can immediately start generating music running these lines of code:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TristanBehrens/js-fakes-4bars")
model = AutoModelForCausalLM.from_pretrained("TristanBehrens/js-fakes-4bars")

input_ids = tokenizer.encode("PIECE_START", return_tensors="pt")
print(input_ids)

generated_ids = model.generate(input_ids, max_length=500)
generated_sequence = tokenizer.decode(generated_ids[0])
print(generated_sequence)

Note that this just generates music as a text. In order to actually listen to the generated music, you can use this notebook.

Limitations and bias

Since this model has been trained on a very small corpus of music, it is overfitting heavily.

Training data

The model has been trained on Omar Peracha's JS Fake Chorales dataset, which is a fine collection of 500 Bach-like chorales.