File size: 11,119 Bytes
91876ef 2160f60 fbd5527 63d3753 aa29bb9 63d3753 e901a31 fbd5527 73745d6 2160f60 73745d6 fbd5527 6a120f9 fbd5527 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
At n_embd': 4, 'n_layer': 1, no coherence in response was obtained.
Upon adding a second layer, (n_embd': 4, 'n_layer': 2) [Epoch 53525/100000, Loss: 0.1281] and [Epoch 100000/100000, Loss: 0.1256], thus:
"Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that rather not long should endure the lives above will world a a great civil people full long resolve altogether as battle new fitting"
Four floats of embeddings is apparently sufficient to support some sequencing, but not quite enough information to sequence so many different/same words and punctuations with. (Microsoft reserearchers recently found that in other LLMs that entire attention heads were focused on "punctution")
See https://medium.com/@thethoughtpalette/are-tiny-transformers-the-future-of-scaling-626594655c48
Quote: "4. Overfitting: Due to their small size, tiny transformers are prone to overfitting on limited datasets. This leads to reduced generalizability, making them less effective when faced with new or varied data inputs. ... Ongoing research continues to focus on refining the mechanisms of these models, striving to enhance their performance while minimizing their footprint.""
At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some glitches. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure "
Adding a second layer to the 6-float model (n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64,) (and no other modifications) did solve the glitch, after almost 60,000 epochs (and an expertly timed gradually-receeding learning rate):
The resulting model_checkpoint_early_stop_Gettysburg_GPT2_v1.4.2.1.py_2024-11-29_01-41-39.pth has Size on Disk of only 0.99 MB (1,040,384 bytes)
A Loss BELOW 0.01 is usually sufficient to obtain a Complete Recital of the entire Gettysburg Address. But, I pushed the (n_embd': 6, 'n_layer': 2) epoch loss down to 0.001, whatever that means.
Epoch 22361/100000, Loss: 0.0054
LOSS IS BELOW 0.01
Epoch 22362/100000, Loss: 0.0033
LOSS IS BELOW 0.01
Epoch 22363/100000, Loss: 0.0044
LOSS IS BELOW 0.01
Epoch 22364/100000, Loss: 0.0032
Epoch 26651/100000, Loss: 0.0024
LOSS IS BELOW 0.01
Epoch 26652/100000, Loss: 0.0039
LOSS IS BELOW 0.01
Epoch 26653/100000, Loss: 0.0024
LOSS IS BELOW 0.01
Epoch 26654/100000, Loss: 0.0034
LOSS IS BELOW 0.01
Epoch 35255/100000, Loss: 0.0017
LOSS IS BELOW 0.01
Epoch 35256/100000, Loss: 0.0018
LOSS IS BELOW 0.01
Epoch 35257/100000, Loss: 0.0015
LOSS IS BELOW 0.01
Epoch 35258/100000, Loss: 0.0024
LOSS IS BELOW 0.01
Epoch 35259/100000, Loss: 0.0021
LOSS IS BELOW 0.01
Epoch 35260/100000, Loss: 0.0042
LOSS IS BELOW 0.01
Epoch 44408/100000, Loss: 0.0015
LOSS IS BELOW 0.01
Learning rate reduced to 0.000034
Epoch 44408/100000, Loss: 0.0015, Learning Rate: 0.000034
Epoch 44409/100000, Loss: 0.0014
LOSS IS BELOW 0.01
Epoch 44410/100000, Loss: 0.0065
LOSS IS BELOW 0.01
Epoch 44411/100000, Loss: 0.0028
Epoch 55978/100000, Loss: 0.0016
LOSS IS BELOW 0.01
Epoch 55979/100000, Loss: 0.0020
LOSS IS BELOW 0.01
Learning rate reduced to 0.000011
Epoch 55979/100000, Loss: 0.0020, Learning Rate: 0.000011
Epoch 55980/100000, Loss: 0.0016
LOSS IS BELOW 0.01
Epoch 55981/100000, Loss: 0.0014
LOSS IS BELOW 0.01
Epoch 58992/100000, Loss: 0.0014
LOSS IS BELOW 0.01
Epoch 58993/100000, Loss: 0.0030
LOSS IS BELOW 0.01
Epoch 58994/100000, Loss: 0.0014
LOSS IS BELOW 0.01
Epoch 58995/100000, Loss: 0.0010
LOSS IS BELOW 0.01
LOSS IS BELOW 0.001
Early stopping: Average loss 0.0010 is below the threshold (0.001).
# --- Inference Examples --- at script line 431
# Example 1: Recite the Gettysburg Address at script line 435
Prompt: four score
Response:
four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and proper that we should do this . but , in a larger sense , we can not dedicate - we can not consecrate - we can not hallow - this ground . the brave men , living and dead , who struggled here , have consecrated it , far above our poor power to add or detract . the world will little note , nor long remember what we say here , but it can never forget what they did here . it is for us the living , rather , to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced . it is rather for us to be here dedicated to the great task remaining before us - that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion - that we here highly resolve that these dead shall not have died in vain - that this nation , under god , shall have a new birth of freedom - and that government of the people , by the people , for the people , shall not perish from the earth . apple blossom cantaloupe durian elderberry fig guava honeydew iguana iguana iguana iguana iguana iguana iguana iguana iguana measure god apple . we we we we we we we
# Example 2: Free text generation after encountering <FreetheLLM> at script line 445
Prompt: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <FreetheLLM>
Freestyle Generation:
we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <pad> <pad> <pad> vain to to men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and proper that we should do this . but , in a larger sense , we can not
HyperParamters = {'vocab_size': 170, 'special_tokens': ['<FreetheLLM>', '<cr>', '<pad>'], 'n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64, 'max_sequence_len': 340, 'epochs': 100000, 'learning_rate': 0.001, 'batch_size': 16, 'dropout': 0.2}
#################################### n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, ###############################################
Epoch 99983/100000, Loss: 0.0474
Epoch 99984/100000, Loss: 0.1334
Epoch 99985/100000, Loss: 0.0775
Epoch 99986/100000, Loss: 0.0629
Epoch 99987/100000, Loss: 0.1047
Epoch 99988/100000, Loss: 0.0988
Epoch 99989/100000, Loss: 0.0666
Epoch 99990/100000, Loss: 0.0633
Epoch 99991/100000, Loss: 0.1468
Epoch 99992/100000, Loss: 0.0667
Epoch 99993/100000, Loss: 0.1081
Epoch 99994/100000, Loss: 0.0680
Epoch 99995/100000, Loss: 0.0754
Epoch 99996/100000, Loss: 0.0507
Epoch 99997/100000, Loss: 0.1052
Epoch 99998/100000, Loss: 0.0613
Epoch 99999/100000, Loss: 0.2482
Epoch 100000/100000, Loss: 0.0892
# --- Inference Examples --- at script line 431
# Example 1: Recite the Gettysburg Address at script line 435
Prompt: four score
Response:
four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in nation - to , it gave by that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure - by civil . these that that that we should do this . but , in a larger sense , we can not dedicate - we can not consecrate - we can not hallow - this ground . the brave men , living and dead , who struggled here , have consecrated it , far above our poor power to add or detract . the world will little note , nor long remember what we say here , but it can never forget what they did here . it is for us the living , rather , to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced . it is rather for us to be here dedicated to the great task brave for dedicate rather who for these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion - that we here highly resolve that these dead shall not have died in vain - that this nation , under god , shall have a new birth of freedom - and that government of the people , by the people , for the people , shall not perish from the earth . apple blossom cantaloupe durian elderberry fig guava honeydew iguana god not gave highly war task detract task task detract larger which detract task detract task detract task which
# Example 2: Free text generation after encountering <FreetheLLM> at script line 445
Prompt: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <FreetheLLM>
Freestyle Generation:
we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new <pad> <pad> <pad> it it gave portion fought apple rather not it it fitting us a to that that can not score to that nation , or any nation so conceived and so dedicated , can long forth do the but elderberry not not so highly war civil above freedom ground gave for gave final portion . not so to that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and come to , these hallow for consecrate on birth of - not struggled , we can not
HyperParamters = {'vocab_size': 170, 'special_tokens': ['<FreetheLLM>', '<cr>', '<pad>'], 'n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, 'max_sequence_len': 340, 'epochs': 100000, 'learning_rate': 0.001, 'batch_size': 16, 'dropout': 0.2}
|