File size: 10,817 Bytes
1ad20a2
3919e78
 
fa938d3
3919e78
 
 
fa938d3
 
 
 
 
1ad20a2
3919e78
979ab0d
54fd7f8
fa938d3
 
 
 
 
3919e78
fa938d3
 
3919e78
 
 
fa938d3
 
3919e78
 
 
fa938d3
 
 
 
 
 
 
 
 
a7d67c8
bbd7189
f680768
fa938d3
 
6fd137f
fa938d3
3919e78
 
 
fa938d3
 
 
 
 
 
 
5f4f05a
fa938d3
 
 
 
 
 
 
 
 
 
 
 
 
 
f2c092d
bfe9086
f2c092d
fa938d3
54fd7f8
 
fa938d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3919e78
 
fa938d3
3919e78
f680768
fa938d3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
tags:
- generated_from_keras_callback
- music
model-index:
- name: juancopi81/mutopia_guitar_mmm
  results: []
datasets:
- juancopi81/mutopia_guitar_dataset
widget:
- text: "PIECE_START TIME_SIGNATURE=4_4 BPM=90 TRACK_START INST=0 DENSITY=2 BAR_START NOTE_ON=43"
  example_title: "Time signature 4/4, BPM=90, NOTE=G2"
---

# juancopi81/mutopia_guitar_mmm

Music generation could be approached similarly to language generation. There are many ways to represent music as text and then use a language model to create a model capable of music generation. For encoding MIDI files as text, I am using the excellent [implementation](https://github.com/AI-Guru/MMM-JSB) of Dr. Tristan Beheren of the paper: [MMM: Exploring Conditional Multi-Track Music Generation with the Transformer](https://arxiv.org/abs/2008.06048).

This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the [Mutopia Guitar Dataset](https://huggingface.co/datasets/juancopi81/mutopia_guitar_dataset). Use the widget to generate your piece, and then use [this notebook](https://colab.research.google.com/drive/14vlJwCvDmNH6SFfVuYY0Y18qTbaHEJCY?usp=sharing) to listen to the results (work in progress). 
I created the notebook as an adaptation of [the one created by Dr. Tristan Behrens](https://huggingface.co/TristanBehrens/js-fakes-4bars).

It achieves the following results on the evaluation set:
- Train Loss: 0.5365
- Validation Loss: 1.5482

## Model description

The model is GPT-2 loaded with the GPT2LMHeadModel architecture from Hugging Face. The context size is 256, and the vocabulary size is 588. The model uses a 
`WhitespaceSplit` pre-tokenizer. The [tokenizer](https://huggingface.co/juancopi81/mutopia_guitar_dataset_tokenizer) is also in the Hugging Face hub. 

## Intended uses & limitations

I built this model to learn more about how to use Hugging Face. I am implementing some of the parts of the [Hugging Face course](https://huggingface.co/course/chapter1/1) with a project that I find interesting. 
The main intention of this model is educational. I am creating a [series of notebooks](https://github.com/juancopi81/MMM_Mutopia_Guitar) where I show every step of the process:
- Collecting the data
- Pre-processing the data
- Training a tokenizer from scratch
- Fine-tuning a GPT-2 model
- Building a Gradio app for the model

I trained the model using the free version of Colab with a small dataset. Right now, it is heavily overfitting. My idea is to have a more extensive dataset of Guitar Music from Latinoamerica to train a new model similar to the Mutopia Guitar Model, using more GPU resources.

## Training and evaluation data

I am training the model with [Mutopia Guitar Dataset](https://huggingface.co/datasets/juancopi81/mutopia_guitar_dataset). This dataset consists of the soloist guitar pieces of the [Mutopia Project](https://www.mutopiaproject.org/). 
The dataset mainly contains guitar music from western classical composers, such as Sor, Aguado, Carcassi, and Giuliani.

For the first epochs of training, I transposed the notes by raising and lowering the pitches using the twelve semi-tones of an entire octave. Later, I trained the model without transposing the pieces so that generation shows better results of a real guitar piece.

### Training hyperparameters

The following hyperparameters were used during training (with transposition):
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 5726, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition - first round):
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition - second round): 
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - third round):
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - fourth round):
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - fifth round):
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}

The following hyperparameters were used during training (without transposition, new tokenizer - sixth round):
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 350, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}


- training_precision: mixed_float16

### Training results
Using transposition:
| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 1.0705     | 1.3590          | 0     |
| 0.8889     | 1.3702          | 1     |
| 0.7588     | 1.3974          | 2     |
| 0.7294     | 1.4813          | 3     |
| 0.6263     | 1.5263          | 4     |
| 0.5841     | 1.5263          | 5     |
| 0.5844     | 1.5263          | 6     |
| 0.5837     | 1.5346         | 7     |
| 0.5798     | 1.5411         | 8     |
| 0.5773     | 1.5440         | 9     |

Without transposition (first round):
| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 0.5503     | 1.5436          | 0     |
| 0.5503     | 1.5425          | 1     |
| 0.5476     | 1.5425          | 2     |
| 0.5467     | 1.5425          | 3     |
| 0.5447     | 1.5431          | 4     |
| 0.5418     | 1.5447          | 5     |
| 0.5418     | 1.5451          | 6     |
| 0.5401     | 1.5472         | 7     |
| 0.5386     | 1.5479         | 8     |
| 0.5365     | 1.5482         | 9     |

Without transposition (second round):
| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 0.5368     | 1.5482          | 0     |
| 0.5355     | 1.5480          | 1     |
| 0.5326     | 1.5488          | 2     |
| 0.5363     | 1.5493          | 3     |
| 0.5346     | 1.5488          | 4     |
| 0.5329     | 1.5502          | 5     |
| 0.5329     | 1.5514          | 6     |
| 0.5308     | 1.5514         | 7     |
| 0.5292     | 1.5536         | 8     |
| 0.5272     | 1.5543         | 9     |

Without transposition (third round - new tokenizer):
| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 6.1361 | 6.4569 | 0 |
| 5.6383 | 5.8249 | 1 |
| 4.9125 | 4.8956 | 2 |
| 4.2013 | 4.2778 | 3 |
| 3.8665 | 4.0330 | 4 |
| 3.7106 | 3.8956 | 5 |
| 3.6041 | 3.7995 | 6 |
| 3.5301 | 3.7485 | 7 |
| 3.4973 | 3.7323 | 8 |
| 3.4909 | 3.7323 | 9 |

Without transposition (fourth round - new tokenizer):
| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 3.4879 | 3.7206 | 0 |
| 3.4667 | 3.6874 | 1 |
| 3.4229 | 3.6373 | 2 |
| 3.3680 | 3.5751 | 3 |
| 3.2998 | 3.5026 | 4 |
| 3.2208 | 3.4240 | 5 |
| 3.1385 | 3.3397 | 6 |
| 3.0580 | 3.2587 | 7 |
| 2.9949 | 3.2118 | 8 |
| 2.9646 | 3.1958 | 9 |

Without transposition (fifth round - new tokenizer):
| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 2.9562 | 3.1902 | 0 |
| 2.9457 | 3.1751 | 1 |
| 2.9266 | 3.1512 | 2 |
| 2.9039 | 3.1176 | 3 |
| 2.8705 | 3.0775 | 4 |
| 2.8291 | 3.0295 | 5 |
| 2.7872 | 2.9811 | 6 |
| 2.7394 | 2.9321 | 7 |
| 2.6996 | 2.9023 | 8 |
| 2.6819 | 2.8927 | 9 |

Without transposition (sixth round - new tokenizer):
| Train Loss | Validation Loss | Epoch |
|:----------:|:---------------:|:-----:|
| 2.6769 | 2.8894 | 0 |
| 2.6719 | 2.8791 | 1 |
| 2.6612 | 2.8638 | 2 |
| 2.6465 | 2.8439 | 3 |
| 2.6242 | 2.8174 | 4 |
| 2.6006 | 2.7877 | 5 |
| 2.5679 | 2.7554 | 6 |
| 2.5387 | 2.7223 | 7 |
| 2.5115 | 2.7029 | 8 |
| 2.5011 | 2.6970 | 9 |

### Framework versions
- Transformers 4.22.1
- TensorFlow 2.8.2
- Datasets 2.5.1
- Tokenizers 0.12.1