Qwen
/

Qwen2.5-Coder-32B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Update README.md

#9

by AngelVenerov - opened 2 days ago

base: refs/heads/main

←

from: refs/pr/9

Discussion Files changed

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ language:
 - en
 base_model:
 - Qwen/Qwen2.5-Coder-32B
-pipeline_tag: text-generation
 library_name: transformers
 tags:
 - code
@@ -32,7 +32,7 @@ Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (
 - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
 - Number of Parameters: 32.5B
 - Number of Paramaters (Non-Embedding): 31.0B
-- Number of Layers: 64
 - Number of Attention Heads (GQA): 40 for Q and 8 for KV
 - Context Length: Full 131,072 tokens
   - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
@@ -78,7 +78,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(
     **model_inputs,
-    max_new_tokens=512
 )
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
@@ -132,4 +132,4 @@ If you find our work helpful, feel free to give us a cite.
       journal={arXiv preprint arXiv:2407.10671},
       year={2024}
 }
-```

 - en
 base_model:
 - Qwen/Qwen2.5-Coder-32B
+pipeline_tag: image-to-text
 library_name: transformers
 tags:
 - code
 - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
 - Number of Parameters: 32.5B
 - Number of Paramaters (Non-Embedding): 31.0B
+- Number of Layers: 512
 - Number of Attention Heads (GQA): 40 for Q and 8 for KV
 - Context Length: Full 131,072 tokens
   - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
 generated_ids = model.generate(
     **model_inputs,
+    max_new_tokens=4096
 )
 generated_ids = [
     output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
       journal={arXiv preprint arXiv:2407.10671},
       year={2024}
 }
+```