File size: 1,802 Bytes

34cfbe5
1adb1f7
6d3bcaf
34cfbe5
 
1adb1f7
34cfbe5
1adb1f7
34cfbe5
1adb1f7
34cfbe5
1adb1f7
34cfbe5
1adb1f7
 
 
 
34cfbe5
1adb1f7
34cfbe5
1adb1f7
34cfbe5
1adb1f7
34cfbe5
1adb1f7
 
 
 
 
34cfbe5
1adb1f7
34cfbe5
1adb1f7
34cfbe5
1adb1f7
 
 
34cfbe5
1adb1f7
34cfbe5
1adb1f7
34cfbe5
1adb1f7
 
34cfbe5
1adb1f7
 
 
 
34cfbe5
1adb1f7
 
34cfbe5
1adb1f7
34cfbe5
1adb1f7
34cfbe5
1adb1f7
 
6d3bcaf

---
license: mit
pipeline_tag: audio-to-audio
---

## MusicMaker - Transformer Model for Music Generation

#### Overview:

MusicMaker is a transformer-based model trained to generate novel musical compositions in the MIDI format. By learning from a dataset of piano MIDI files, the model can capture the intricate patterns and structures present in music and generate coherent and creative melodies.

#### Key Features:

- Generation of novel musical compositions in MIDI format
- Trained on a dataset of piano MIDI files
- Based on transformer architecture for capturing long-range dependencies
- Tokenizer trained specifically on MIDI data using miditok library

#### Training Data:

The model was trained on a dataset of ~11,000 piano MIDI files from the "adl-piano-midi" collection.

#### Model Details:

- Architecture: GPT-style transformer
- Number of layers: 12
- Hidden size: 512
- Attention heads: 8
- Tokenizer vocabulary size: 12,000

#### Usage:

```py

from transformers import AutoModel
from miditok import MusicTokenizer
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

tokenizer = MusicTokenizer.from_pretrained('shikhr/music_maker')

model = AutoModel.from_pretrained('shikhr/music_maker', trust_remote_code=True)
model.to(device)

# Generate some music
out = model.generate(
    torch.tensor([[1]]).to(device), max_new_tokens=400, temperature=1.0, top_k=None
)

# Save the generated MIDI
tokenizer(out[0].tolist()).dump_midi("generated.mid")

```

#### Limitations and Bias:

- The model has only been trained on piano MIDI data, so its ability to generalize to other instruments may be limited.
- The generated music may exhibit some repetitive or unnatural patterns.
- The training data itself may contain certain biases or patterns reflective of its sources.