MartialTerran
commited on
Commit
•
2160f60
1
Parent(s):
91876ef
Update With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success.
Browse files
With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success.
CHANGED
@@ -2,7 +2,7 @@ At n_embd': 4, 'n_layer': 1, no coherence in response was obtained.
|
|
2 |
Upon adding a second layer, (n_embd': 4, 'n_layer': 2) [Epoch 53525/100000, Loss: 0.1281] and [Epoch 100000/100000, Loss: 0.1256], thus:
|
3 |
"Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that rather not long should endure the lives above will world a a great civil people full long resolve altogether as battle new fitting"
|
4 |
|
5 |
-
Four floats of embeddings is apparently
|
6 |
|
7 |
At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some glitches. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure "
|
8 |
|
@@ -10,7 +10,7 @@ Adding a second layer to the 6-float model (n_embd': 6, 'n_layer': 2, 'n_head':
|
|
10 |
|
11 |
The resulting model_checkpoint_early_stop_Gettysburg_GPT2_v1.4.2.1.py_2024-11-29_01-41-39.pth has Size on Disk of only 0.99 MB (1,040,384 bytes)
|
12 |
|
13 |
-
A Loss BELOW 0.01 is usually sufficient to obtain a Complete Recital of the entire Gettysburg Address. But, I pushed
|
14 |
|
15 |
Epoch 22361/100000, Loss: 0.0054
|
16 |
LOSS IS BELOW 0.01
|
|
|
2 |
Upon adding a second layer, (n_embd': 4, 'n_layer': 2) [Epoch 53525/100000, Loss: 0.1281] and [Epoch 100000/100000, Loss: 0.1256], thus:
|
3 |
"Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that rather not long should endure the lives above will world a a great civil people full long resolve altogether as battle new fitting"
|
4 |
|
5 |
+
Four floats of embeddings is apparently sufficient to support some sequencing, but not quite enough information to sequence so many different/same words and punctuations with. (Microsoft reserearchers recently found that in other LLMs that entire attention heads were focused on "punctution")
|
6 |
|
7 |
At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some glitches. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure "
|
8 |
|
|
|
10 |
|
11 |
The resulting model_checkpoint_early_stop_Gettysburg_GPT2_v1.4.2.1.py_2024-11-29_01-41-39.pth has Size on Disk of only 0.99 MB (1,040,384 bytes)
|
12 |
|
13 |
+
A Loss BELOW 0.01 is usually sufficient to obtain a Complete Recital of the entire Gettysburg Address. But, I pushed the (n_embd': 6, 'n_layer': 2) epoch loss down to 0.001, whatever that means.
|
14 |
|
15 |
Epoch 22361/100000, Loss: 0.0054
|
16 |
LOSS IS BELOW 0.01
|