Toy_GPTs_LLMs_for_CPU_Educational / With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success.
MartialTerran's picture
Update With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success.
91876ef verified
raw
history blame
10.6 kB
At n_embd': 4, 'n_layer': 1, no coherence in response was obtained.
Upon adding a second layer, (n_embd': 4, 'n_layer': 2) [Epoch 53525/100000, Loss: 0.1281] and [Epoch 100000/100000, Loss: 0.1256], thus:
"Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that rather not long should endure the lives above will world a a great civil people full long resolve altogether as battle new fitting"
Four floats of embeddings is apparently quite not enough information to sequence so many different/same words and punctuations with. (Microsoft reserearchers recently found that in other LLMs that entire attention heads were focused on "punctution")
At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some glitches. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure "
Adding a second layer to the 6-float model (n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64,) (and no other modifications) did solve the glitch, after almost 60,000 epochs (and an expertly timed gradually-receeding learning rate):
The resulting model_checkpoint_early_stop_Gettysburg_GPT2_v1.4.2.1.py_2024-11-29_01-41-39.pth has Size on Disk of only 0.99 MB (1,040,384 bytes)
A Loss BELOW 0.01 is usually sufficient to obtain a Complete Recital of the entire Gettysburg Address. But, I pushed to epoch loss down to 0.001, whatever that means.
Epoch 22361/100000, Loss: 0.0054
LOSS IS BELOW 0.01
Epoch 22362/100000, Loss: 0.0033
LOSS IS BELOW 0.01
Epoch 22363/100000, Loss: 0.0044
LOSS IS BELOW 0.01
Epoch 22364/100000, Loss: 0.0032
Epoch 26651/100000, Loss: 0.0024
LOSS IS BELOW 0.01
Epoch 26652/100000, Loss: 0.0039
LOSS IS BELOW 0.01
Epoch 26653/100000, Loss: 0.0024
LOSS IS BELOW 0.01
Epoch 26654/100000, Loss: 0.0034
LOSS IS BELOW 0.01
Epoch 35255/100000, Loss: 0.0017
LOSS IS BELOW 0.01
Epoch 35256/100000, Loss: 0.0018
LOSS IS BELOW 0.01
Epoch 35257/100000, Loss: 0.0015
LOSS IS BELOW 0.01
Epoch 35258/100000, Loss: 0.0024
LOSS IS BELOW 0.01
Epoch 35259/100000, Loss: 0.0021
LOSS IS BELOW 0.01
Epoch 35260/100000, Loss: 0.0042
LOSS IS BELOW 0.01
Epoch 44408/100000, Loss: 0.0015
LOSS IS BELOW 0.01
Learning rate reduced to 0.000034
Epoch 44408/100000, Loss: 0.0015, Learning Rate: 0.000034
Epoch 44409/100000, Loss: 0.0014
LOSS IS BELOW 0.01
Epoch 44410/100000, Loss: 0.0065
LOSS IS BELOW 0.01
Epoch 44411/100000, Loss: 0.0028
Epoch 55978/100000, Loss: 0.0016
LOSS IS BELOW 0.01
Epoch 55979/100000, Loss: 0.0020
LOSS IS BELOW 0.01
Learning rate reduced to 0.000011
Epoch 55979/100000, Loss: 0.0020, Learning Rate: 0.000011
Epoch 55980/100000, Loss: 0.0016
LOSS IS BELOW 0.01
Epoch 55981/100000, Loss: 0.0014
LOSS IS BELOW 0.01
Epoch 58992/100000, Loss: 0.0014
LOSS IS BELOW 0.01
Epoch 58993/100000, Loss: 0.0030
LOSS IS BELOW 0.01
Epoch 58994/100000, Loss: 0.0014
LOSS IS BELOW 0.01
Epoch 58995/100000, Loss: 0.0010
LOSS IS BELOW 0.01
LOSS IS BELOW 0.001
Early stopping: Average loss 0.0010 is below the threshold (0.001).
# --- Inference Examples --- at script line 431
# Example 1: Recite the Gettysburg Address at script line 435
Prompt: four score
Response:
four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and proper that we should do this . but , in a larger sense , we can not dedicate - we can not consecrate - we can not hallow - this ground . the brave men , living and dead , who struggled here , have consecrated it , far above our poor power to add or detract . the world will little note , nor long remember what we say here , but it can never forget what they did here . it is for us the living , rather , to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced . it is rather for us to be here dedicated to the great task remaining before us - that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion - that we here highly resolve that these dead shall not have died in vain - that this nation , under god , shall have a new birth of freedom - and that government of the people , by the people , for the people , shall not perish from the earth . apple blossom cantaloupe durian elderberry fig guava honeydew iguana iguana iguana iguana iguana iguana iguana iguana iguana measure god apple . we we we we we we we
# Example 2: Free text generation after encountering <FreetheLLM> at script line 445
Prompt: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <FreetheLLM>
Freestyle Generation:
we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <pad> <pad> <pad> vain to to men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and proper that we should do this . but , in a larger sense , we can not
HyperParamters = {'vocab_size': 170, 'special_tokens': ['<FreetheLLM>', '<cr>', '<pad>'], 'n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64, 'max_sequence_len': 340, 'epochs': 100000, 'learning_rate': 0.001, 'batch_size': 16, 'dropout': 0.2}
#################################### n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, ###############################################
Epoch 99983/100000, Loss: 0.0474
Epoch 99984/100000, Loss: 0.1334
Epoch 99985/100000, Loss: 0.0775
Epoch 99986/100000, Loss: 0.0629
Epoch 99987/100000, Loss: 0.1047
Epoch 99988/100000, Loss: 0.0988
Epoch 99989/100000, Loss: 0.0666
Epoch 99990/100000, Loss: 0.0633
Epoch 99991/100000, Loss: 0.1468
Epoch 99992/100000, Loss: 0.0667
Epoch 99993/100000, Loss: 0.1081
Epoch 99994/100000, Loss: 0.0680
Epoch 99995/100000, Loss: 0.0754
Epoch 99996/100000, Loss: 0.0507
Epoch 99997/100000, Loss: 0.1052
Epoch 99998/100000, Loss: 0.0613
Epoch 99999/100000, Loss: 0.2482
Epoch 100000/100000, Loss: 0.0892
# --- Inference Examples --- at script line 431
# Example 1: Recite the Gettysburg Address at script line 435
Prompt: four score
Response:
four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in nation - to , it gave by that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure - by civil . these that that that we should do this . but , in a larger sense , we can not dedicate - we can not consecrate - we can not hallow - this ground . the brave men , living and dead , who struggled here , have consecrated it , far above our poor power to add or detract . the world will little note , nor long remember what we say here , but it can never forget what they did here . it is for us the living , rather , to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced . it is rather for us to be here dedicated to the great task brave for dedicate rather who for these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion - that we here highly resolve that these dead shall not have died in vain - that this nation , under god , shall have a new birth of freedom - and that government of the people , by the people , for the people , shall not perish from the earth . apple blossom cantaloupe durian elderberry fig guava honeydew iguana god not gave highly war task detract task task detract larger which detract task detract task detract task which
# Example 2: Free text generation after encountering <FreetheLLM> at script line 445
Prompt: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <FreetheLLM>
Freestyle Generation:
we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <pad> <pad> <pad> it it gave portion fought apple rather not it it fitting us a to that that can not score to that nation , or any nation so conceived and so dedicated , can long forth do the but elderberry not not so highly war civil above freedom ground gave for gave final portion . not so to that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and come to , these hallow for consecrate on birth of - not struggled , we can not
HyperParamters = {'vocab_size': 170, 'special_tokens': ['<FreetheLLM>', '<cr>', '<pad>'], 'n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, 'max_sequence_len': 340, 'epochs': 100000, 'learning_rate': 0.001, 'batch_size': 16, 'dropout': 0.2}