FractalGPT
/

RuQwen2.5-3B-Instruct-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

marattt commited on 4 days ago

Commit

2b97d63

•

1 Parent(s): 9e3a8f8

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -30,7 +30,7 @@ pipeline_tag: text-generation
 - **Training Stages**: Pretraining & Instruction Tuning
 - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
 - **Layers**: 64
-- **Attention Heads (GQA)**: 40 for Q and 8 for KV
 - **Context Length**: Supports a full context of 131,072 tokens and generation of up to 8,192 tokens
 - **Quantization**: AWQ 4 bit
 - **Base model**: Qwen/Qwen2.5-3B-Instruct-AWQ
@@ -109,7 +109,7 @@ def generate(messages):
   generated_text = tokenizer.decode(output[0], skip_special_tokens=False)#.split('<|im_start|>assistant')[1]
   return generated_text
-model_name = 'FractalGPT/RuQwen2.5-32B-Instruct-AWQ'
 model = Qwen2ForCausalLMWithBias.from_pretrained(model_name, torch_dtype=torch.float16)
 tokenizer = AutoTokenizer.from_pretrained(model_name)

 - **Training Stages**: Pretraining & Instruction Tuning
 - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
 - **Layers**: 64
+- **Attention Heads (GQA)**: 16 for Q and 2 for KV
 - **Context Length**: Supports a full context of 131,072 tokens and generation of up to 8,192 tokens
 - **Quantization**: AWQ 4 bit
 - **Base model**: Qwen/Qwen2.5-3B-Instruct-AWQ
   generated_text = tokenizer.decode(output[0], skip_special_tokens=False)#.split('<|im_start|>assistant')[1]
   return generated_text
+model_name = 'FractalGPT/RuQwen2.5-3B-Instruct-AWQ'
 model = Qwen2ForCausalLMWithBias.from_pretrained(model_name, torch_dtype=torch.float16)
 tokenizer = AutoTokenizer.from_pretrained(model_name)