student-abdullah
/

Quantized_Qwen-2.5-Coding-0.5B_mixed_selective

text-generation-inference

Model card Files Files and versions Community

student-abdullah commited on 21 days ago

Commit

6ea82ab

·

verified ·

1 Parent(s): 25e884b

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ tags:
 ---
 # Quantization Description
-This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while preerving the capabilities in generating relevant and accurate responses related python programming.
 The quantization method included *32-bit* quantization of the following Layers:
 - q_proj
 - v_proj
@@ -54,7 +54,7 @@ Rest of the remaining layers were quantized to *q3_k_l*
 ---
 # Model Architect
-*Qwen2ForCausalLM(
   (model): Qwen2Model(
     (embed_tokens): Embedding(151936, 896, padding_idx=151665)
     (layers): ModuleList(
@@ -80,8 +80,7 @@ Rest of the remaining layers were quantized to *q3_k_l*
     (rotary_emb): LlamaRotaryEmbedding()
   )
   (lm_head): Linear(in_features=896, out_features=151936, bias=False)
-)*
 ---
 # Performance & Limitations

 ---
 # Quantization Description
+This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while preserving the capabilities in generating relevant and accurate responses related python programming.
 The quantization method included *32-bit* quantization of the following Layers:
 - q_proj
 - v_proj
 ---
 # Model Architect
+Qwen2ForCausalLM(
   (model): Qwen2Model(
     (embed_tokens): Embedding(151936, 896, padding_idx=151665)
     (layers): ModuleList(
     (rotary_emb): LlamaRotaryEmbedding()
   )
   (lm_head): Linear(in_features=896, out_features=151936, bias=False)
+)
 ---
 # Performance & Limitations