MartialTerran commited on
Commit
73745d6
1 Parent(s): 6a120f9

Update With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success.

Browse files
With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success. CHANGED
@@ -1,8 +1,13 @@
1
- At n_embd': 4, there was no coherence obtained.
2
 
3
  At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some mistakes. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure "
4
 
5
- Adding a second layer to the 6-float model (n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64,) did solve the glitch, after almost 60,000 epochs:
 
 
 
 
 
6
  Epoch 22361/100000, Loss: 0.0054
7
  LOSS IS BELOW 0.01
8
  Epoch 22362/100000, Loss: 0.0033
 
1
+ At n_embd': 4, no coherence in response was obtained.
2
 
3
  At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some mistakes. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure "
4
 
5
+ Adding a second layer to the 6-float model (n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64,) (and no other modifications) did solve the glitch, after almost 60,000 epochs (and an expertly timed gradually-receeding learning rate):
6
+
7
+ The resulting model_checkpoint_early_stop_Gettysburg_GPT2_v1.4.2.1.py_2024-11-29_01-41-39.pth has Size on Disk of only 0.99 MB (1,040,384 bytes)
8
+
9
+ A Loss BELOW 0.01 is usually sufficient to obtain a Complete Recital of the entire Gettysburg Address. But, I pushed to epoch loss down to 0.001, whatever that means.
10
+
11
  Epoch 22361/100000, Loss: 0.0054
12
  LOSS IS BELOW 0.01
13
  Epoch 22362/100000, Loss: 0.0033