Update README.md
Browse files
README.md
CHANGED
@@ -5,9 +5,84 @@ tags: []
|
|
5 |
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
-
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
|
|
11 |
|
12 |
## Model Details
|
13 |
|
|
|
5 |
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
+
Patched LLama 3.2 8B from LLaMA 3.2 11B Model
|
9 |
+
|
10 |
+
Here’s the complete, refined code for patching the weights:
|
11 |
+
```python
|
12 |
+
# Import required libraries
|
13 |
+
from transformers import AutoProcessor, AutoTokenizer, AutoModelForImageTextToText, AutoModelForCausalLM
|
14 |
+
|
15 |
+
# Load the 11B Vision-Instruct model
|
16 |
+
processor = AutoProcessor.from_pretrained("meta-llama/Llama-3.2-11B-Vision-Instruct")
|
17 |
+
model = AutoModelForImageTextToText.from_pretrained("meta-llama/Llama-3.2-11B-Vision-Instruct")
|
18 |
+
|
19 |
+
# Load the 8B text-only model
|
20 |
+
s_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
|
21 |
+
s_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
|
22 |
+
|
23 |
+
# Prepare input text for testing
|
24 |
+
input_text = "Write me a poem about Machine Learning."
|
25 |
+
input_ids = s_tokenizer(input_text, return_tensors="pt")
|
26 |
+
|
27 |
+
# Test the original 8B model
|
28 |
+
outputs = s_model.generate(**input_ids, do_sample=False, max_new_tokens=10)
|
29 |
+
print("8B Model Output:", s_tokenizer.decode(outputs[0]))
|
30 |
+
|
31 |
+
# Patch weights from the 11B model into the 8B model
|
32 |
+
model_weight = model.state_dict()
|
33 |
+
s_model_dict = s_model.state_dict()
|
34 |
+
skip_layer = 0 # Track skipped layers
|
35 |
+
|
36 |
+
for key in s_model_dict.keys():
|
37 |
+
if "layers." in key:
|
38 |
+
layer_idx = int(key.split("layers.")[1].split(".")[0]) # Extract layer index
|
39 |
+
try:
|
40 |
+
s_model_dict[key] = model_weight[
|
41 |
+
"language_model." + key.replace(f"layers.{layer_idx}.", f"layers.{layer_idx + skip_layer}.")
|
42 |
+
]
|
43 |
+
except KeyError:
|
44 |
+
skip_layer += 1
|
45 |
+
s_model_dict[key] = model_weight[
|
46 |
+
"language_model." + key.replace(f"layers.{layer_idx}.", f"layers.{layer_idx + skip_layer}.")
|
47 |
+
]
|
48 |
+
else:
|
49 |
+
s_model_dict[key] = model_weight["language_model." + key]
|
50 |
+
|
51 |
+
# Test the patched 8B model
|
52 |
+
outputs = s_model.generate(**input_ids, do_sample=False, max_new_tokens=10)
|
53 |
+
print("Patched 8B Model Output:", s_tokenizer.decode(outputs[0]))
|
54 |
+
|
55 |
+
# Test the original 11B model
|
56 |
+
outputs = model.generate(**input_ids, do_sample=False, max_new_tokens=10)
|
57 |
+
print("11B Model Output:", s_tokenizer.decode(outputs[0]))
|
58 |
+
|
59 |
+
```
|
60 |
+
|
61 |
+
### **Example Outputs**
|
62 |
+
|
63 |
+
**Prompt:** "Write me a poem about Machine Learning."
|
64 |
+
|
65 |
+
**Outputs:**
|
66 |
+
1. **8B Model Output (Before Patching):**
|
67 |
+
```
|
68 |
+
<|begin_of_text|>Write me a poem about Machine Learning.
|
69 |
+
Artificial minds, born from code,
|
70 |
+
Learning
|
71 |
+
```
|
72 |
+
|
73 |
+
2. **Patched 8B Model Output:**
|
74 |
+
```
|
75 |
+
<|begin_of_text|>Write me a poem about Machine Learning.
|
76 |
+
In silicon halls, where data reigns
|
77 |
+
```
|
78 |
+
|
79 |
+
3. **11B Model Output:**
|
80 |
+
```
|
81 |
+
<|begin_of_text|>Write me a poem about Machine Learning.
|
82 |
+
In silicon halls, where data reigns
|
83 |
+
```
|
84 |
|
85 |
+
---
|
86 |
|
87 |
## Model Details
|
88 |
|