Toy_GPTs_LLMs_for_CPU_Educational
/
With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success.
Update With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success.
aa29bb9
verified
At n_embd': 4, 'n_layer': 1, no coherence in response was obtained. | |
Upon adding a second layer, (n_embd': 4, 'n_layer': 2) [Epoch 53525/100000, Loss: 0.1281] and [Epoch 100000/100000, Loss: 0.1256], thus: | |
"Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that rather not long should endure the lives above will world a a great civil people full long resolve altogether as battle new fitting" | |
Four floats of embeddings is apparently sufficient to support some sequencing, but not quite enough information to sequence so many different/same words and punctuations with. (Microsoft reserearchers recently found that in other LLMs that entire attention heads were focused on "punctution") | |
See https://medium.com/@thethoughtpalette/are-tiny-transformers-the-future-of-scaling-626594655c48 | |
Quote: "4. Overfitting: Due to their small size, tiny transformers are prone to overfitting on limited datasets. This leads to reduced generalizability, making them less effective when faced with new or varied data inputs. ... Ongoing research continues to focus on refining the mechanisms of these models, striving to enhance their performance while minimizing their footprint."" | |
At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some glitches. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure " | |
Adding a second layer to the 6-float model (n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64,) (and no other modifications) did solve the glitch, after almost 60,000 epochs (and an expertly timed gradually-receeding learning rate): | |
The resulting model_checkpoint_early_stop_Gettysburg_GPT2_v1.4.2.1.py_2024-11-29_01-41-39.pth has Size on Disk of only 0.99 MB (1,040,384 bytes) | |
A Loss BELOW 0.01 is usually sufficient to obtain a Complete Recital of the entire Gettysburg Address. But, I pushed the (n_embd': 6, 'n_layer': 2) epoch loss down to 0.001, whatever that means. | |
Epoch 22361/100000, Loss: 0.0054 | |
LOSS IS BELOW 0.01 | |
Epoch 22362/100000, Loss: 0.0033 | |
LOSS IS BELOW 0.01 | |
Epoch 22363/100000, Loss: 0.0044 | |
LOSS IS BELOW 0.01 | |
Epoch 22364/100000, Loss: 0.0032 | |
Epoch 26651/100000, Loss: 0.0024 | |
LOSS IS BELOW 0.01 | |
Epoch 26652/100000, Loss: 0.0039 | |
LOSS IS BELOW 0.01 | |
Epoch 26653/100000, Loss: 0.0024 | |
LOSS IS BELOW 0.01 | |
Epoch 26654/100000, Loss: 0.0034 | |
LOSS IS BELOW 0.01 | |
Epoch 35255/100000, Loss: 0.0017 | |
LOSS IS BELOW 0.01 | |
Epoch 35256/100000, Loss: 0.0018 | |
LOSS IS BELOW 0.01 | |
Epoch 35257/100000, Loss: 0.0015 | |
LOSS IS BELOW 0.01 | |
Epoch 35258/100000, Loss: 0.0024 | |
LOSS IS BELOW 0.01 | |
Epoch 35259/100000, Loss: 0.0021 | |
LOSS IS BELOW 0.01 | |
Epoch 35260/100000, Loss: 0.0042 | |
LOSS IS BELOW 0.01 | |
Epoch 44408/100000, Loss: 0.0015 | |
LOSS IS BELOW 0.01 | |
Learning rate reduced to 0.000034 | |
Epoch 44408/100000, Loss: 0.0015, Learning Rate: 0.000034 | |
Epoch 44409/100000, Loss: 0.0014 | |
LOSS IS BELOW 0.01 | |
Epoch 44410/100000, Loss: 0.0065 | |
LOSS IS BELOW 0.01 | |
Epoch 44411/100000, Loss: 0.0028 | |
Epoch 55978/100000, Loss: 0.0016 | |
LOSS IS BELOW 0.01 | |
Epoch 55979/100000, Loss: 0.0020 | |
LOSS IS BELOW 0.01 | |
Learning rate reduced to 0.000011 | |
Epoch 55979/100000, Loss: 0.0020, Learning Rate: 0.000011 | |
Epoch 55980/100000, Loss: 0.0016 | |
LOSS IS BELOW 0.01 | |
Epoch 55981/100000, Loss: 0.0014 | |
LOSS IS BELOW 0.01 | |
Epoch 58992/100000, Loss: 0.0014 | |
LOSS IS BELOW 0.01 | |
Epoch 58993/100000, Loss: 0.0030 | |
LOSS IS BELOW 0.01 | |
Epoch 58994/100000, Loss: 0.0014 | |
LOSS IS BELOW 0.01 | |
Epoch 58995/100000, Loss: 0.0010 | |
LOSS IS BELOW 0.01 | |
LOSS IS BELOW 0.001 | |
Early stopping: Average loss 0.0010 is below the threshold (0.001). | |
# --- Inference Examples --- at script line 431 | |
# Example 1: Recite the Gettysburg Address at script line 435 | |
Prompt: four score | |
Response: | |
four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and proper that we should do this . but , in a larger sense , we can not dedicate - we can not consecrate - we can not hallow - this ground . the brave men , living and dead , who struggled here , have consecrated it , far above our poor power to add or detract . the world will little note , nor long remember what we say here , but it can never forget what they did here . it is for us the living , rather , to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced . it is rather for us to be here dedicated to the great task remaining before us - that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion - that we here highly resolve that these dead shall not have died in vain - that this nation , under god , shall have a new birth of freedom - and that government of the people , by the people , for the people , shall not perish from the earth . apple blossom cantaloupe durian elderberry fig guava honeydew iguana iguana iguana iguana iguana iguana iguana iguana iguana measure god apple . we we we we we we we | |
# Example 2: Free text generation after encountering <FreetheLLM> at script line 445 | |
Prompt: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <FreetheLLM> | |
Freestyle Generation: | |
we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <pad> <pad> <pad> vain to to men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and proper that we should do this . but , in a larger sense , we can not | |
HyperParamters = {'vocab_size': 170, 'special_tokens': ['<FreetheLLM>', '<cr>', '<pad>'], 'n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64, 'max_sequence_len': 340, 'epochs': 100000, 'learning_rate': 0.001, 'batch_size': 16, 'dropout': 0.2} | |
#################################### n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, ############################################### | |
Epoch 99983/100000, Loss: 0.0474 | |
Epoch 99984/100000, Loss: 0.1334 | |
Epoch 99985/100000, Loss: 0.0775 | |
Epoch 99986/100000, Loss: 0.0629 | |
Epoch 99987/100000, Loss: 0.1047 | |
Epoch 99988/100000, Loss: 0.0988 | |
Epoch 99989/100000, Loss: 0.0666 | |
Epoch 99990/100000, Loss: 0.0633 | |
Epoch 99991/100000, Loss: 0.1468 | |
Epoch 99992/100000, Loss: 0.0667 | |
Epoch 99993/100000, Loss: 0.1081 | |
Epoch 99994/100000, Loss: 0.0680 | |
Epoch 99995/100000, Loss: 0.0754 | |
Epoch 99996/100000, Loss: 0.0507 | |
Epoch 99997/100000, Loss: 0.1052 | |
Epoch 99998/100000, Loss: 0.0613 | |
Epoch 99999/100000, Loss: 0.2482 | |
Epoch 100000/100000, Loss: 0.0892 | |
# --- Inference Examples --- at script line 431 | |
# Example 1: Recite the Gettysburg Address at script line 435 | |
Prompt: four score | |
Response: | |
four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in nation - to , it gave by that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure - by civil . these that that that we should do this . but , in a larger sense , we can not dedicate - we can not consecrate - we can not hallow - this ground . the brave men , living and dead , who struggled here , have consecrated it , far above our poor power to add or detract . the world will little note , nor long remember what we say here , but it can never forget what they did here . it is for us the living , rather , to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced . it is rather for us to be here dedicated to the great task brave for dedicate rather who for these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion - that we here highly resolve that these dead shall not have died in vain - that this nation , under god , shall have a new birth of freedom - and that government of the people , by the people , for the people , shall not perish from the earth . apple blossom cantaloupe durian elderberry fig guava honeydew iguana god not gave highly war task detract task task detract larger which detract task detract task detract task which | |
# Example 2: Free text generation after encountering <FreetheLLM> at script line 445 | |
Prompt: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <FreetheLLM> | |
Freestyle Generation: | |
we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <pad> <pad> <pad> it it gave portion fought apple rather not it it fitting us a to that that can not score to that nation , or any nation so conceived and so dedicated , can long forth do the but elderberry not not so highly war civil above freedom ground gave for gave final portion . not so to that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and come to , these hallow for consecrate on birth of - not struggled , we can not | |
HyperParamters = {'vocab_size': 170, 'special_tokens': ['<FreetheLLM>', '<cr>', '<pad>'], 'n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, 'max_sequence_len': 340, 'epochs': 100000, 'learning_rate': 0.001, 'batch_size': 16, 'dropout': 0.2} | |