|
--- |
|
base_model: Qwen/Qwen2.5-Coder-0.5B |
|
datasets: None |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- torch |
|
- trl |
|
- unsloth |
|
- llama |
|
- gguf |
|
--- |
|
|
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** student-abdullah |
|
- **License:** apache-2.0 |
|
- **Quantized from model:** Qwen2.5-Coder-0.5B |
|
- **Created on:** 06th July, 2025 |
|
|
|
--- |
|
# Acknowledgement |
|
<div style="display: flex; gap: 10px; align-items: center;"> |
|
<img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="200"/> |
|
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/ChatGPT-Logo.svg/2048px-ChatGPT-Logo.svg.png" width="140"/> |
|
<img src="https://compareaimodels.com/content/images/2024/08/qwen-square.svg" width="200"/> |
|
</div> |
|
|
|
--- |
|
# Quantization Description |
|
This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while preserving the capabilities in generating relevant and accurate responses related python programming. |
|
The quantization method included *32-bit* quantization of the following Layers: |
|
- q_proj |
|
- v_proj |
|
- o_proj |
|
- down_proj |
|
- lm_head |
|
|
|
Rest of the remaining layers were quantized to *q3_k_l* |
|
|
|
--- |
|
# Model Description |
|
| Layer Name | Role (Short) | Type | |
|
| ---------------------------- | ----------------------------------------------------- | -------------- | |
|
| `q_proj`, `k_proj`, `v_proj` | Compute query, key, and value for attention mechanism | Attention Proj | |
|
| `o_proj` | Projects attention output back to model hidden size | Attention Proj | |
|
| `down_proj` | Projects MLP output down to hidden size | MLP | |
|
| `gate_proj` | First part of Gated MLP, controls info flow | MLP | |
|
| `up_proj` | Expands hidden size in MLP | MLP | |
|
| `lm_head` | Final linear layer for logits | Output Head | |
|
| `embed_tokens` | Token embedding layer | Input Embed | |
|
| `norm` | Final layernorm | Normalization | |
|
| `*_layernorm` | Normalize inputs to layers | Normalization | |
|
|
|
--- |
|
# Model Architect |
|
<pre><code>Qwen2ForCausalLM( |
|
(model): Qwen2Model( |
|
(embed_tokens): Embedding(151936, 896, padding_idx=151665) |
|
(layers): ModuleList( |
|
(0-23): 24 x Qwen2DecoderLayer( |
|
(self_attn): Qwen2Attention( |
|
(q_proj): Linear(in_features=896, out_features=896, bias=True) |
|
(k_proj): Linear(in_features=896, out_features=128, bias=True) |
|
(v_proj): Linear(in_features=896, out_features=128, bias=True) |
|
(o_proj): Linear(in_features=896, out_features=896, bias=False) |
|
(rotary_emb): LlamaRotaryEmbedding() |
|
) |
|
(mlp): Qwen2MLP( |
|
(gate_proj): Linear(in_features=896, out_features=4864, bias=False) |
|
(up_proj): Linear(in_features=896, out_features=4864, bias=False) |
|
(down_proj): Linear(in_features=4864, out_features=896, bias=False) |
|
(act_fn): SiLU() |
|
) |
|
(input_layernorm): Qwen2RMSNorm((896,), eps=1e-06) |
|
(post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06) |
|
) |
|
) |
|
(norm): Qwen2RMSNorm((896,), eps=1e-06) |
|
(rotary_emb): LlamaRotaryEmbedding() |
|
) |
|
(lm_head): Linear(in_features=896, out_features=151936, bias=False) |
|
)</code></pre> |
|
|
|
--- |
|
# Performance & Limitations |
|
- YET TO BE EXAMINED |
|
|
|
--- |
|
# Model Performace Evaluation: |
|
- YET TO BE EVALUATED |
|
|
|
<p align="center"> |
|
<img src="" width="20%" style="display:inline-block;"/> |
|
<img src="" width="35%" style="display:inline-block;"/> |
|
<img src="" width="35%" style="display:inline-block;"/> |
|
</p> |