At n_embd': 4, 'n_layer': 1, no coherence in response was obtained. Upon adding a second layer, (n_embd': 4, 'n_layer': 2) [Epoch 53525/100000, Loss: 0.1281] and [Epoch 100000/100000, Loss: 0.1256], thus: "Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that rather not long should endure the lives above will world a a great civil people full long resolve altogether as battle new fitting" Four floats of embeddings is apparently sufficient to support some sequencing, but not quite enough information to sequence so many different/same words and punctuations with. (Microsoft reserearchers recently found that in other LLMs that entire attention heads were focused on "punctution") See https://medium.com/@thethoughtpalette/are-tiny-transformers-the-future-of-scaling-626594655c48 Quote: "4. Overfitting: Due to their small size, tiny transformers are prone to overfitting on limited datasets. This leads to reduced generalizability, making them less effective when faced with new or varied data inputs. ... Ongoing research continues to focus on refining the mechanisms of these models, striving to enhance their performance while minimizing their footprint."" At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some glitches. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure " Adding a second layer to the 6-float model (n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64,) (and no other modifications) did solve the glitch, after almost 60,000 epochs (and an expertly timed gradually-receeding learning rate): The resulting model_checkpoint_early_stop_Gettysburg_GPT2_v1.4.2.1.py_2024-11-29_01-41-39.pth has Size on Disk of only 0.99 MB (1,040,384 bytes) A Loss BELOW 0.01 is usually sufficient to obtain a Complete Recital of the entire Gettysburg Address. But, I pushed the (n_embd': 6, 'n_layer': 2) epoch loss down to 0.001, whatever that means. Epoch 22361/100000, Loss: 0.0054 LOSS IS BELOW 0.01 Epoch 22362/100000, Loss: 0.0033 LOSS IS BELOW 0.01 Epoch 22363/100000, Loss: 0.0044 LOSS IS BELOW 0.01 Epoch 22364/100000, Loss: 0.0032 Epoch 26651/100000, Loss: 0.0024 LOSS IS BELOW 0.01 Epoch 26652/100000, Loss: 0.0039 LOSS IS BELOW 0.01 Epoch 26653/100000, Loss: 0.0024 LOSS IS BELOW 0.01 Epoch 26654/100000, Loss: 0.0034 LOSS IS BELOW 0.01 Epoch 35255/100000, Loss: 0.0017 LOSS IS BELOW 0.01 Epoch 35256/100000, Loss: 0.0018 LOSS IS BELOW 0.01 Epoch 35257/100000, Loss: 0.0015 LOSS IS BELOW 0.01 Epoch 35258/100000, Loss: 0.0024 LOSS IS BELOW 0.01 Epoch 35259/100000, Loss: 0.0021 LOSS IS BELOW 0.01 Epoch 35260/100000, Loss: 0.0042 LOSS IS BELOW 0.01 Epoch 44408/100000, Loss: 0.0015 LOSS IS BELOW 0.01 Learning rate reduced to 0.000034 Epoch 44408/100000, Loss: 0.0015, Learning Rate: 0.000034 Epoch 44409/100000, Loss: 0.0014 LOSS IS BELOW 0.01 Epoch 44410/100000, Loss: 0.0065 LOSS IS BELOW 0.01 Epoch 44411/100000, Loss: 0.0028 Epoch 55978/100000, Loss: 0.0016 LOSS IS BELOW 0.01 Epoch 55979/100000, Loss: 0.0020 LOSS IS BELOW 0.01 Learning rate reduced to 0.000011 Epoch 55979/100000, Loss: 0.0020, Learning Rate: 0.000011 Epoch 55980/100000, Loss: 0.0016 LOSS IS BELOW 0.01 Epoch 55981/100000, Loss: 0.0014 LOSS IS BELOW 0.01 Epoch 58992/100000, Loss: 0.0014 LOSS IS BELOW 0.01 Epoch 58993/100000, Loss: 0.0030 LOSS IS BELOW 0.01 Epoch 58994/100000, Loss: 0.0014 LOSS IS BELOW 0.01 Epoch 58995/100000, Loss: 0.0010 LOSS IS BELOW 0.01 LOSS IS BELOW 0.001 Early stopping: Average loss 0.0010 is below the threshold (0.001). # --- Inference Examples --- at script line 431 # Example 1: Recite the Gettysburg Address at script line 435 Prompt: four score Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and proper that we should do this . but , in a larger sense , we can not dedicate - we can not consecrate - we can not hallow - this ground . the brave men , living and dead , who struggled here , have consecrated it , far above our poor power to add or detract . the world will little note , nor long remember what we say here , but it can never forget what they did here . it is for us the living , rather , to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced . it is rather for us to be here dedicated to the great task remaining before us - that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion - that we here highly resolve that these dead shall not have died in vain - that this nation , under god , shall have a new birth of freedom - and that government of the people , by the people , for the people , shall not perish from the earth . apple blossom cantaloupe durian elderberry fig guava honeydew iguana iguana iguana iguana iguana iguana iguana iguana iguana measure god apple . we we we we we we we # Example 2: Free text generation after encountering at script line 445 Prompt: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new Freestyle Generation: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new vain to to men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and proper that we should do this . but , in a larger sense , we can not HyperParamters = {'vocab_size': 170, 'special_tokens': ['', '', ''], 'n_embd': 6, 'n_layer': 2, 'n_head': 1, 'n_inner': 64, 'max_sequence_len': 340, 'epochs': 100000, 'learning_rate': 0.001, 'batch_size': 16, 'dropout': 0.2} #################################### n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, ############################################### Epoch 99983/100000, Loss: 0.0474 Epoch 99984/100000, Loss: 0.1334 Epoch 99985/100000, Loss: 0.0775 Epoch 99986/100000, Loss: 0.0629 Epoch 99987/100000, Loss: 0.1047 Epoch 99988/100000, Loss: 0.0988 Epoch 99989/100000, Loss: 0.0666 Epoch 99990/100000, Loss: 0.0633 Epoch 99991/100000, Loss: 0.1468 Epoch 99992/100000, Loss: 0.0667 Epoch 99993/100000, Loss: 0.1081 Epoch 99994/100000, Loss: 0.0680 Epoch 99995/100000, Loss: 0.0754 Epoch 99996/100000, Loss: 0.0507 Epoch 99997/100000, Loss: 0.1052 Epoch 99998/100000, Loss: 0.0613 Epoch 99999/100000, Loss: 0.2482 Epoch 100000/100000, Loss: 0.0892 # --- Inference Examples --- at script line 431 # Example 1: Recite the Gettysburg Address at script line 435 Prompt: four score Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in nation - to , it gave by that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure - by civil . these that that that we should do this . but , in a larger sense , we can not dedicate - we can not consecrate - we can not hallow - this ground . the brave men , living and dead , who struggled here , have consecrated it , far above our poor power to add or detract . the world will little note , nor long remember what we say here , but it can never forget what they did here . it is for us the living , rather , to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced . it is rather for us to be here dedicated to the great task brave for dedicate rather who for these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion - that we here highly resolve that these dead shall not have died in vain - that this nation , under god , shall have a new birth of freedom - and that government of the people , by the people , for the people , shall not perish from the earth . apple blossom cantaloupe durian elderberry fig guava honeydew iguana god not gave highly war task detract task task detract larger which detract task detract task detract task which # Example 2: Free text generation after encountering at script line 445 Prompt: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new Freestyle Generation: we here highly resolve that these dead shall not have died in vain and that this nation under god shall have a new it it gave portion fought apple rather not it it fitting us a to that that can not score to that nation , or any nation so conceived and so dedicated , can long forth do the but elderberry not not so highly war civil above freedom ground gave for gave final portion . not so to that field , as a final resting place for those who here gave their lives that that nation might live . it is altogether fitting and come to , these hallow for consecrate on birth of - not struggled , we can not HyperParamters = {'vocab_size': 170, 'special_tokens': ['', '', ''], 'n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, 'max_sequence_len': 340, 'epochs': 100000, 'learning_rate': 0.001, 'batch_size': 16, 'dropout': 0.2}