y-ryan commited on
Commit
5cab0ba
·
verified ·
1 Parent(s): eaeae60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -13,7 +13,15 @@ tags:
13
  license: llama3.3
14
  ---
15
 
16
- The [original model](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B) had invalid `tensor.Shape` for weights (`[1, 8192]`), raising following errors when loading with `transformers`:
 
 
 
 
 
 
 
 
17
  ```
18
  ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
19
  ```
@@ -60,7 +68,7 @@ if __name__ == "__main__":
60
  main()
61
  ```
62
 
63
- Original README.md from here:
64
 
65
  This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
66
  # merge
 
13
  license: llama3.3
14
  ---
15
 
16
+ [original model]: https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B
17
+
18
+ ### Update Feb 16, 2025 morning PST:
19
+
20
+ The author of the [original model] mentioned that this model gave very different outputs. See ongoing discussion [here](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B/discussions/1#67b21fd3ba726eda5c98e812).
21
+
22
+ # Overview
23
+
24
+ The [original model] had invalid `tensor.Shape` for weights (`[1, 8192]`), raising following errors when loading with `transformers`:
25
  ```
26
  ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
27
  ```
 
68
  main()
69
  ```
70
 
71
+ # Original README.md from here:
72
 
73
  This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
74
  # merge