MartialTerran commited on
Commit
91876ef
1 Parent(s): e901a31

Update With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success.

Browse files
With one layer, n_layer 1, n_embd 4 is failure. but n_embd 6 is marginal success. CHANGED
@@ -1,5 +1,8 @@
1
- At n_embd': 4, 'n_layer': 1, no coherence in response was obtained. Upon adding a second layer, (n_embd': 4, 'n_layer': 2) [Epoch 53525/100000, Loss: 0.1281] and
2
- Four floats of embeddings is apparently not enough information to sequence so many different/same words and punctuations with. (Microsoft reserearchers recently found that in other LLMs that entire attention heads were focused on "punctution")
 
 
 
3
 
4
  At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some glitches. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure "
5
 
 
1
+ At n_embd': 4, 'n_layer': 1, no coherence in response was obtained.
2
+ Upon adding a second layer, (n_embd': 4, 'n_layer': 2) [Epoch 53525/100000, Loss: 0.1281] and [Epoch 100000/100000, Loss: 0.1256], thus:
3
+ "Response: four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in liberty , and dedicated to the proposition that rather not long should endure the lives above will world a a great civil people full long resolve altogether as battle new fitting"
4
+
5
+ Four floats of embeddings is apparently quite not enough information to sequence so many different/same words and punctuations with. (Microsoft reserearchers recently found that in other LLMs that entire attention heads were focused on "punctution")
6
 
7
  At n_embd': 6, 'n_layer': 1, 'n_head': 1, 'n_inner': 64, the Toy Gettysburg GPT-2 model got a good start with "four score and seven years ago our fathers brought forth on this continent , a new nation , conceived in" before some glitches. But resumed another whole part of the Gettysburg speech: "that all men are created equal . now we are engaged in a great civil war , testing whether that nation , or any nation so conceived and so dedicated , can long endure . we are met on a great battle - field of that war . we have come to dedicate a portion of that field , as a final resting place for those who here gave their lives that that nation might endure "
8