student-abdullah commited on
Commit
6222c09
·
verified ·
1 Parent(s): 84bbac3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -3
README.md CHANGED
@@ -1,3 +1,96 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-Coder-0.5B
3
+ datasets: None
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ tags:
8
+ - text-generation-inference
9
+ - transformers
10
+ - torch
11
+ - trl
12
+ - unsloth
13
+ - llama
14
+ - gguf
15
+ ---
16
+
17
+
18
+ # Uploaded model
19
+
20
+ - **Developed by:** student-abdullah
21
+ - **License:** apache-2.0
22
+ - **Quantized from model:** Qwen2.5-Coder-0.5B
23
+ - **Created on:** 06th July, 2025
24
+
25
+ ---
26
+ # Acknowledgement
27
+ <img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="200"/>
28
+
29
+ ---
30
+ # Quantization Description
31
+ This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while preerving the capabilities in generating relevant and accurate responses related python programming.
32
+ The quantization method included *32-bit* quantization of the following Layers:
33
+ - q_proj
34
+ - v_proj
35
+ - o_proj
36
+ - down_proj
37
+ - lm_head
38
+ Rest of the remaining layers were quantized to *q3_k_l*
39
+ ---
40
+ # Model Description
41
+ | Layer Name | Role (Short) | Type |
42
+ | ---------------------------- | ----------------------------------------------------- | -------------- |
43
+ | `q_proj`, `k_proj`, `v_proj` | Compute query, key, and value for attention mechanism | Attention Proj |
44
+ | `o_proj` | Projects attention output back to model hidden size | Attention Proj |
45
+ | `down_proj` | Projects MLP output down to hidden size | MLP |
46
+ | `gate_proj` | First part of Gated MLP, controls info flow | MLP |
47
+ | `up_proj` | Expands hidden size in MLP | MLP |
48
+ | `lm_head` | Final linear layer for logits | Output Head |
49
+ | `embed_tokens` | Token embedding layer | Input Embed |
50
+ | `norm` | Final layernorm | Normalization |
51
+ | `*_layernorm` | Normalize inputs to layers | Normalization |
52
+
53
+ ---
54
+ # Model Architect
55
+ *Qwen2ForCausalLM(
56
+ (model): Qwen2Model(
57
+ (embed_tokens): Embedding(151936, 896, padding_idx=151665)
58
+ (layers): ModuleList(
59
+ (0-23): 24 x Qwen2DecoderLayer(
60
+ (self_attn): Qwen2Attention(
61
+ (q_proj): Linear(in_features=896, out_features=896, bias=True)
62
+ (k_proj): Linear(in_features=896, out_features=128, bias=True)
63
+ (v_proj): Linear(in_features=896, out_features=128, bias=True)
64
+ (o_proj): Linear(in_features=896, out_features=896, bias=False)
65
+ (rotary_emb): LlamaRotaryEmbedding()
66
+ )
67
+ (mlp): Qwen2MLP(
68
+ (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
69
+ (up_proj): Linear(in_features=896, out_features=4864, bias=False)
70
+ (down_proj): Linear(in_features=4864, out_features=896, bias=False)
71
+ (act_fn): SiLU()
72
+ )
73
+ (input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
74
+ (post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
75
+ )
76
+ )
77
+ (norm): Qwen2RMSNorm((896,), eps=1e-06)
78
+ (rotary_emb): LlamaRotaryEmbedding()
79
+ )
80
+ (lm_head): Linear(in_features=896, out_features=151936, bias=False)
81
+ )*
82
+
83
+
84
+ ---
85
+ # Performance & Limitations
86
+ - YET TO BE EXAMINED
87
+
88
+ ---
89
+ # Model Performace Evaluation:
90
+ - YET TO BE EVALUATED
91
+
92
+ <p align="center">
93
+ <img src="" width="20%" style="display:inline-block;"/>
94
+ <img src="" width="35%" style="display:inline-block;"/>
95
+ <img src="" width="35%" style="display:inline-block;"/>
96
+ </p>