File size: 4,983 Bytes
8dbb0f6 5cab0ba 65baf90 5cab0ba eaeae60 8dbb0f6 5cab0ba 8dbb0f6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
base_model:
- TheDrummer/Anubis-70B-v1
- EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1
- meta-llama/Llama-3.3-70B-Instruct
- Sao10K/70B-L3.3-Cirrus-x1
- SicariusSicariiStuff/Negative_LLAMA_70B
- Sao10K/L3.1-70B-Hanami-x1
library_name: transformers
tags:
- mergekit
- merge
license: llama3.3
---
[original model]: https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B
## Update Feb 16, 2025 noon PST:
Upon debugging with the author of the [original model], the author decided to redo the model as something is off with the weights. See discussion [here](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B/discussions/1#67b24c8eb02f929c82a02a73).
**NOTE: This means this repo's weights are also broken!**
## Update Feb 16, 2025 morning PST:
The author of the [original model] mentioned that this model gave very different outputs. See ongoing discussion [here](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B/discussions/1#67b21fd3ba726eda5c98e812).
# Overview
The [original model] had invalid `tensor.Shape` for weights (`[1, 8192]`), raising following errors when loading with `transformers`:
```
ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
```
So I resized them into `[8192]` with following script:
```python
import os
from safetensors.torch import load_file, save_file
# Update this to point to your safetensors directory
MODEL_DIR = "/root/.cache/huggingface/hub/models--Tarek07--Progenitor-V5-Final-LLaMa-70B/snapshots/8ca900fd3a65a725902d525e518be1bf374c0247"
DEST_DIR = "/output/Progenitor-V5-Final-LLaMa-70B"
def fix_shard(shard_path, output_path):
# Load the shard
data = load_file(shard_path)
# data is a dict: key -> torch.Tensor
# Go through every tensor and fix the shape if necessary
for key, tensor in data.items():
# Check if the shape is (1, 8192) instead of (8192)
if list(tensor.shape) == [1, 8192]:
print(f" Fixing {key} in {os.path.basename(shard_path)} from {tensor.shape} to (8192,)")
# Either squeeze(0) or view(-1) or view(8192):
# data[key] = tensor.squeeze(0)
# or
data[key] = tensor.view(8192)
# Save the fixed shard to output_path
save_file(data, output_path, metadata={"format": "pt"})
print(f" -> Saved fixed shard to: {output_path}")
def main():
# Look for .safetensors files in MODEL_DIR
for filename in sorted(os.listdir(MODEL_DIR)):
if filename.endswith(".safetensors"):
shard_path = os.path.join(MODEL_DIR, filename)
output_path = os.path.join(DEST_DIR, f"{filename}")
print(f"Processing: {shard_path}")
fix_shard(shard_path, output_path)
if __name__ == "__main__":
main()
```
# Original README.md from here:
This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
# merge
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
## Merge Details
### Merge Method
This model was merged using the [Linear DELLA](https://arxiv.org/abs/2406.11617) merge method using [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) as a base.
### Models Merged
The following models were included in the merge:
* [TheDrummer/Anubis-70B-v1](https://huggingface.co/TheDrummer/Anubis-70B-v1)
* [EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1](https://huggingface.co/EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1)
* [Sao10K/70B-L3.3-Cirrus-x1](https://huggingface.co/Sao10K/70B-L3.3-Cirrus-x1)
* [SicariusSicariiStuff/Negative_LLAMA_70B](https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B)
* [Sao10K/L3.1-70B-Hanami-x1](https://huggingface.co/Sao10K/L3.1-70B-Hanami-x1)
### Configuration
The following YAML configuration was used to produce this model:
```yaml
models:
- model: Sao10K/L3.1-70B-Hanami-x1
parameters:
weight: 0.20
density: 0.7
- model: Sao10K/70B-L3.3-Cirrus-x1
parameters:
weight: 0.20
density: 0.7
- model: SicariusSicariiStuff/Negative_LLAMA_70B
parameters:
weight: 0.20
density: 0.7
- model: TheDrummer/Anubis-70B-v1
parameters:
weight: 0.20
density: 0.7
- model: EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1
parameters:
weight: 0.20
density: 0.7
merge_method: della_linear
base_model: meta-llama/Llama-3.3-70B-Instruct
parameters:
epsilon: 0.2
lambda: 1.1
int8_mask: true
dtype: float32
out_dtype: bfloat16
tokenizer:
source: SicariusSicariiStuff/Negative_LLAMA_70B
```
|