student-abdullah
/

Quantized_Qwen-2.5-Coding-0.5B_mixed_selective

text-generation-inference

Model card Files Files and versions Community

Quantized_Qwen-2.5-Coding-0.5B_mixed_selective / README.md

student-abdullah's picture

student-abdullah

Update README.md

6e1defe verified 19 days ago

|

history blame contribute delete

3.89 kB

	---
	base_model: Qwen/Qwen2.5-Coder-0.5B
	datasets: None
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- torch
	- trl
	- unsloth
	- llama
	- gguf
	---


	# Uploaded model

	- Developed by: student-abdullah
	- License: apache-2.0
	- Quantized from model: Qwen2.5-Coder-0.5B
	- Created on: 06th July, 2025

	---
	# Acknowledgement
	<div style="display: flex; gap: 10px; align-items: center;">
	<img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="200"/>
	<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/ChatGPT-Logo.svg/2048px-ChatGPT-Logo.svg.png" width="140"/>
	<img src="https://compareaimodels.com/content/images/2024/08/qwen-square.svg" width="200"/>
	</div>

	---
	# Quantization Description
	This model is quantized using selective quantization from the Qwen2.5-Coder-0.5B base model to increase its speed while preserving the capabilities in generating relevant and accurate responses related python programming.
	The quantization method included 32-bit quantization of the following Layers:
	- q_proj
	- v_proj
	- o_proj
	- down_proj
	- lm_head

	Rest of the remaining layers were quantized to q3_k_l

	---
	# Model Description
	\| Layer Name \| Role (Short) \| Type \|
	\| ---------------------------- \| ----------------------------------------------------- \| -------------- \|
	\| `q_proj`, `k_proj`, `v_proj` \| Compute query, key, and value for attention mechanism \| Attention Proj \|
	\| `o_proj` \| Projects attention output back to model hidden size \| Attention Proj \|
	\| `down_proj` \| Projects MLP output down to hidden size \| MLP \|
	\| `gate_proj` \| First part of Gated MLP, controls info flow \| MLP \|
	\| `up_proj` \| Expands hidden size in MLP \| MLP \|
	\| `lm_head` \| Final linear layer for logits \| Output Head \|
	\| `embed_tokens` \| Token embedding layer \| Input Embed \|
	\| `norm` \| Final layernorm \| Normalization \|
	\| `*_layernorm` \| Normalize inputs to layers \| Normalization \|

	---
	# Model Architect
	<pre><code>Qwen2ForCausalLM(
	(model): Qwen2Model(
	(embed_tokens): Embedding(151936, 896, padding_idx=151665)
	(layers): ModuleList(
	(0-23): 24 x Qwen2DecoderLayer(
	(self_attn): Qwen2Attention(
	(q_proj): Linear(in_features=896, out_features=896, bias=True)
	(k_proj): Linear(in_features=896, out_features=128, bias=True)
	(v_proj): Linear(in_features=896, out_features=128, bias=True)
	(o_proj): Linear(in_features=896, out_features=896, bias=False)
	(rotary_emb): LlamaRotaryEmbedding()
	)
	(mlp): Qwen2MLP(
	(gate_proj): Linear(in_features=896, out_features=4864, bias=False)
	(up_proj): Linear(in_features=896, out_features=4864, bias=False)
	(down_proj): Linear(in_features=4864, out_features=896, bias=False)
	(act_fn): SiLU()
	)
	(input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
	(post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
	)
	)
	(norm): Qwen2RMSNorm((896,), eps=1e-06)
	(rotary_emb): LlamaRotaryEmbedding()
	)
	(lm_head): Linear(in_features=896, out_features=151936, bias=False)
	)</code></pre>

	---
	# Performance & Limitations
	- YET TO BE EXAMINED

	---
	# Model Performace Evaluation:
	- YET TO BE EVALUATED

	<p align="center">
	<img src="" width="20%" style="display:inline-block;"/>
	<img src="" width="35%" style="display:inline-block;"/>
	<img src="" width="35%" style="display:inline-block;"/>
	</p>