File size: 3,891 Bytes
6222c09 6e1defe 6222c09 6ea82ab 6222c09 25e884b 6222c09 7d4b6be 6222c09 2598b66 38aa0f3 2598b66 6222c09 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
base_model: Qwen/Qwen2.5-Coder-0.5B
datasets: None
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- torch
- trl
- unsloth
- llama
- gguf
---
# Uploaded model
- **Developed by:** student-abdullah
- **License:** apache-2.0
- **Quantized from model:** Qwen2.5-Coder-0.5B
- **Created on:** 06th July, 2025
---
# Acknowledgement
<div style="display: flex; gap: 10px; align-items: center;">
<img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="200"/>
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/ChatGPT-Logo.svg/2048px-ChatGPT-Logo.svg.png" width="140"/>
<img src="https://compareaimodels.com/content/images/2024/08/qwen-square.svg" width="200"/>
</div>
---
# Quantization Description
This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while preserving the capabilities in generating relevant and accurate responses related python programming.
The quantization method included *32-bit* quantization of the following Layers:
- q_proj
- v_proj
- o_proj
- down_proj
- lm_head
Rest of the remaining layers were quantized to *q3_k_l*
---
# Model Description
| Layer Name | Role (Short) | Type |
| ---------------------------- | ----------------------------------------------------- | -------------- |
| `q_proj`, `k_proj`, `v_proj` | Compute query, key, and value for attention mechanism | Attention Proj |
| `o_proj` | Projects attention output back to model hidden size | Attention Proj |
| `down_proj` | Projects MLP output down to hidden size | MLP |
| `gate_proj` | First part of Gated MLP, controls info flow | MLP |
| `up_proj` | Expands hidden size in MLP | MLP |
| `lm_head` | Final linear layer for logits | Output Head |
| `embed_tokens` | Token embedding layer | Input Embed |
| `norm` | Final layernorm | Normalization |
| `*_layernorm` | Normalize inputs to layers | Normalization |
---
# Model Architect
<pre><code>Qwen2ForCausalLM(
(model): Qwen2Model(
(embed_tokens): Embedding(151936, 896, padding_idx=151665)
(layers): ModuleList(
(0-23): 24 x Qwen2DecoderLayer(
(self_attn): Qwen2Attention(
(q_proj): Linear(in_features=896, out_features=896, bias=True)
(k_proj): Linear(in_features=896, out_features=128, bias=True)
(v_proj): Linear(in_features=896, out_features=128, bias=True)
(o_proj): Linear(in_features=896, out_features=896, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): Linear(in_features=896, out_features=4864, bias=False)
(up_proj): Linear(in_features=896, out_features=4864, bias=False)
(down_proj): Linear(in_features=4864, out_features=896, bias=False)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
(post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
)
)
(norm): Qwen2RMSNorm((896,), eps=1e-06)
(rotary_emb): LlamaRotaryEmbedding()
)
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
)</code></pre>
---
# Performance & Limitations
- YET TO BE EXAMINED
---
# Model Performace Evaluation:
- YET TO BE EVALUATED
<p align="center">
<img src="" width="20%" style="display:inline-block;"/>
<img src="" width="35%" style="display:inline-block;"/>
<img src="" width="35%" style="display:inline-block;"/>
</p> |