juancopi81
commited on
Commit
·
5065808
1
Parent(s):
4097cfc
Add expand, collapse options to README.md
Browse files
README.md
CHANGED
@@ -49,6 +49,9 @@ For the first epochs of training, I transposed the notes by raising and lowering
|
|
49 |
|
50 |
### Training hyperparameters
|
51 |
|
|
|
|
|
|
|
52 |
The following hyperparameters were used during training (with transposition):
|
53 |
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 5726, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
54 |
|
@@ -73,10 +76,13 @@ The following hyperparameters were used during training (without transposition,
|
|
73 |
The following hyperparameters were used during training (without transposition, new tokenizer - seventh round):
|
74 |
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0005, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 1025, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
75 |
|
76 |
-
|
77 |
- training_precision: mixed_float16
|
|
|
78 |
|
79 |
### Training results
|
|
|
|
|
|
|
80 |
Using transposition:
|
81 |
| Train Loss | Validation Loss | Epoch |
|
82 |
|:----------:|:---------------:|:-----:|
|
@@ -193,7 +199,7 @@ Without transposition (seventh round - new tokenizer):
|
|
193 |
| 0.3223 | 1.7940 | 12 |
|
194 |
| 0.2158 | 1.9032 | 13 |
|
195 |
| 0.1448 | 1.9892 | 14 |
|
196 |
-
|
197 |
|
198 |
### Framework versions
|
199 |
- Transformers 4.22.1
|
|
|
49 |
|
50 |
### Training hyperparameters
|
51 |
|
52 |
+
<details>
|
53 |
+
<summary>Click to expand</summary>
|
54 |
+
|
55 |
The following hyperparameters were used during training (with transposition):
|
56 |
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 5e-07, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 5e-07, 'decay_steps': 5726, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
57 |
|
|
|
76 |
The following hyperparameters were used during training (without transposition, new tokenizer - seventh round):
|
77 |
- optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0005, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0005, 'decay_steps': 1025, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 1000, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
|
78 |
|
|
|
79 |
- training_precision: mixed_float16
|
80 |
+
</details>
|
81 |
|
82 |
### Training results
|
83 |
+
|
84 |
+
<details>
|
85 |
+
<summary>Click to expand</summary>
|
86 |
Using transposition:
|
87 |
| Train Loss | Validation Loss | Epoch |
|
88 |
|:----------:|:---------------:|:-----:|
|
|
|
199 |
| 0.3223 | 1.7940 | 12 |
|
200 |
| 0.2158 | 1.9032 | 13 |
|
201 |
| 0.1448 | 1.9892 | 14 |
|
202 |
+
</details>
|
203 |
|
204 |
### Framework versions
|
205 |
- Transformers 4.22.1
|