Qwen
/

Qwen2.5-Coder-1.5B-Instruct-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

feihu.hf commited on Sep 18, 2024

Commit

edc3bdc

·

1 Parent(s): 28409f2

update config.json

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -34,7 +34,8 @@ Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (
 - Number of Paramaters (Non-Embedding): 1.31B
 - Number of Layers: 28
 - Number of Attention Heads (GQA): 12 for Q and 2 for KV
-{{GGUF_LONG_SUMMARY}}
 - Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).

 - Number of Paramaters (Non-Embedding): 1.31B
 - Number of Layers: 28
 - Number of Attention Heads (GQA): 12 for Q and 2 for KV
+- Context Length: Full 32,768 tokens
+  - Note: Currently, only vLLM supports YARN for length extrapolating. If you want to process sequences up to 131,072 tokens, please refer to non-GGUF models.
 - Quantization: q2_K, q3_K_M, q4_0, q4_K_M, q5_0, q5_K_M, q6_K, q8_0
 For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5-coder/), [GitHub](https://github.com/QwenLM/Qwen2.5-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).