File size: 4,983 Bytes
8dbb0f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5cab0ba
 
65baf90
 
 
 
 
 
 
5cab0ba
 
 
 
 
 
eaeae60
 
 
 
 
8dbb0f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5cab0ba
8dbb0f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
base_model:
- TheDrummer/Anubis-70B-v1
- EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1
- meta-llama/Llama-3.3-70B-Instruct
- Sao10K/70B-L3.3-Cirrus-x1
- SicariusSicariiStuff/Negative_LLAMA_70B
- Sao10K/L3.1-70B-Hanami-x1
library_name: transformers
tags:
- mergekit
- merge
license: llama3.3
---

[original model]: https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B

## Update Feb 16, 2025 noon PST:

Upon debugging with the author of the [original model], the author decided to redo the model as something is off with the weights. See discussion [here](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B/discussions/1#67b24c8eb02f929c82a02a73).

**NOTE: This means this repo's weights are also broken!**

## Update Feb 16, 2025 morning PST:

The author of the [original model] mentioned that this model gave very different outputs. See ongoing discussion [here](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B/discussions/1#67b21fd3ba726eda5c98e812).

# Overview

The [original model] had invalid `tensor.Shape` for weights (`[1, 8192]`), raising following errors when loading with `transformers`:
```
ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
```

So I resized them into `[8192]` with following script:
```python
import os
from safetensors.torch import load_file, save_file

# Update this to point to your safetensors directory
MODEL_DIR = "/root/.cache/huggingface/hub/models--Tarek07--Progenitor-V5-Final-LLaMa-70B/snapshots/8ca900fd3a65a725902d525e518be1bf374c0247"
DEST_DIR = "/output/Progenitor-V5-Final-LLaMa-70B"

def fix_shard(shard_path, output_path):
    # Load the shard
    data = load_file(shard_path)
    # data is a dict:  key -> torch.Tensor

    # Go through every tensor and fix the shape if necessary
    for key, tensor in data.items():
        # Check if the shape is (1, 8192) instead of (8192)
        if list(tensor.shape) == [1, 8192]:
            print(f"  Fixing {key} in {os.path.basename(shard_path)} from {tensor.shape} to (8192,)")
            # Either squeeze(0) or view(-1) or view(8192):
            #   data[key] = tensor.squeeze(0)
            # or
            data[key] = tensor.view(8192)

    # Save the fixed shard to output_path
    save_file(data, output_path, metadata={"format": "pt"})
    print(f"  -> Saved fixed shard to: {output_path}")

def main():
    # Look for .safetensors files in MODEL_DIR
    for filename in sorted(os.listdir(MODEL_DIR)):
        if filename.endswith(".safetensors"):
            shard_path = os.path.join(MODEL_DIR, filename)
            output_path = os.path.join(DEST_DIR, f"{filename}")

            print(f"Processing: {shard_path}")
            fix_shard(shard_path, output_path)

if __name__ == "__main__":
    main()
```

# Original README.md from here:

This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best. 
# merge

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

## Merge Details
### Merge Method

This model was merged using the [Linear DELLA](https://arxiv.org/abs/2406.11617) merge method using [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) as a base.

### Models Merged

The following models were included in the merge:
* [TheDrummer/Anubis-70B-v1](https://huggingface.co/TheDrummer/Anubis-70B-v1)
* [EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1](https://huggingface.co/EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1)
* [Sao10K/70B-L3.3-Cirrus-x1](https://huggingface.co/Sao10K/70B-L3.3-Cirrus-x1)
* [SicariusSicariiStuff/Negative_LLAMA_70B](https://huggingface.co/SicariusSicariiStuff/Negative_LLAMA_70B)
* [Sao10K/L3.1-70B-Hanami-x1](https://huggingface.co/Sao10K/L3.1-70B-Hanami-x1)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
models:
  - model: Sao10K/L3.1-70B-Hanami-x1
    parameters:
      weight: 0.20
      density: 0.7
  - model: Sao10K/70B-L3.3-Cirrus-x1
    parameters:
      weight: 0.20
      density: 0.7
  - model: SicariusSicariiStuff/Negative_LLAMA_70B
    parameters:
      weight: 0.20
      density: 0.7
  - model: TheDrummer/Anubis-70B-v1
    parameters:
      weight: 0.20
      density: 0.7
  - model: EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1
    parameters:
      weight: 0.20
      density: 0.7
merge_method: della_linear
base_model: meta-llama/Llama-3.3-70B-Instruct
parameters:
  epsilon: 0.2
  lambda: 1.1
  int8_mask: true
dtype: float32
out_dtype: bfloat16
tokenizer:
 source: SicariusSicariiStuff/Negative_LLAMA_70B
```