Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,15 @@ tags:
|
|
13 |
license: llama3.3
|
14 |
---
|
15 |
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
```
|
18 |
ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
|
19 |
```
|
@@ -60,7 +68,7 @@ if __name__ == "__main__":
|
|
60 |
main()
|
61 |
```
|
62 |
|
63 |
-
Original README.md from here:
|
64 |
|
65 |
This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
|
66 |
# merge
|
|
|
13 |
license: llama3.3
|
14 |
---
|
15 |
|
16 |
+
[original model]: https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B
|
17 |
+
|
18 |
+
### Update Feb 16, 2025 morning PST:
|
19 |
+
|
20 |
+
The author of the [original model] mentioned that this model gave very different outputs. See ongoing discussion [here](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B/discussions/1#67b21fd3ba726eda5c98e812).
|
21 |
+
|
22 |
+
# Overview
|
23 |
+
|
24 |
+
The [original model] had invalid `tensor.Shape` for weights (`[1, 8192]`), raising following errors when loading with `transformers`:
|
25 |
```
|
26 |
ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
|
27 |
```
|
|
|
68 |
main()
|
69 |
```
|
70 |
|
71 |
+
# Original README.md from here:
|
72 |
|
73 |
This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
|
74 |
# merge
|