Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ tags:
|
|
28 |
|
29 |
---
|
30 |
# Quantization Description
|
31 |
-
This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while
|
32 |
The quantization method included *32-bit* quantization of the following Layers:
|
33 |
- q_proj
|
34 |
- v_proj
|
@@ -54,7 +54,7 @@ Rest of the remaining layers were quantized to *q3_k_l*
|
|
54 |
|
55 |
---
|
56 |
# Model Architect
|
57 |
-
|
58 |
(model): Qwen2Model(
|
59 |
(embed_tokens): Embedding(151936, 896, padding_idx=151665)
|
60 |
(layers): ModuleList(
|
@@ -80,8 +80,7 @@ Rest of the remaining layers were quantized to *q3_k_l*
|
|
80 |
(rotary_emb): LlamaRotaryEmbedding()
|
81 |
)
|
82 |
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
|
83 |
-
)
|
84 |
-
|
85 |
|
86 |
---
|
87 |
# Performance & Limitations
|
|
|
28 |
|
29 |
---
|
30 |
# Quantization Description
|
31 |
+
This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while preserving the capabilities in generating relevant and accurate responses related python programming.
|
32 |
The quantization method included *32-bit* quantization of the following Layers:
|
33 |
- q_proj
|
34 |
- v_proj
|
|
|
54 |
|
55 |
---
|
56 |
# Model Architect
|
57 |
+
Qwen2ForCausalLM(
|
58 |
(model): Qwen2Model(
|
59 |
(embed_tokens): Embedding(151936, 896, padding_idx=151665)
|
60 |
(layers): ModuleList(
|
|
|
80 |
(rotary_emb): LlamaRotaryEmbedding()
|
81 |
)
|
82 |
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
|
83 |
+
)
|
|
|
84 |
|
85 |
---
|
86 |
# Performance & Limitations
|