marattt commited on
Commit
2b97d63
1 Parent(s): 9e3a8f8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -30,7 +30,7 @@ pipeline_tag: text-generation
30
  - **Training Stages**: Pretraining & Instruction Tuning
31
  - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
32
  - **Layers**: 64
33
- - **Attention Heads (GQA)**: 40 for Q and 8 for KV
34
  - **Context Length**: Supports a full context of 131,072 tokens and generation of up to 8,192 tokens
35
  - **Quantization**: AWQ 4 bit
36
  - **Base model**: Qwen/Qwen2.5-3B-Instruct-AWQ
@@ -109,7 +109,7 @@ def generate(messages):
109
  generated_text = tokenizer.decode(output[0], skip_special_tokens=False)#.split('<|im_start|>assistant')[1]
110
  return generated_text
111
 
112
- model_name = 'FractalGPT/RuQwen2.5-32B-Instruct-AWQ'
113
  model = Qwen2ForCausalLMWithBias.from_pretrained(model_name, torch_dtype=torch.float16)
114
  tokenizer = AutoTokenizer.from_pretrained(model_name)
115
 
 
30
  - **Training Stages**: Pretraining & Instruction Tuning
31
  - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
32
  - **Layers**: 64
33
+ - **Attention Heads (GQA)**: 16 for Q and 2 for KV
34
  - **Context Length**: Supports a full context of 131,072 tokens and generation of up to 8,192 tokens
35
  - **Quantization**: AWQ 4 bit
36
  - **Base model**: Qwen/Qwen2.5-3B-Instruct-AWQ
 
109
  generated_text = tokenizer.decode(output[0], skip_special_tokens=False)#.split('<|im_start|>assistant')[1]
110
  return generated_text
111
 
112
+ model_name = 'FractalGPT/RuQwen2.5-3B-Instruct-AWQ'
113
  model = Qwen2ForCausalLMWithBias.from_pretrained(model_name, torch_dtype=torch.float16)
114
  tokenizer = AutoTokenizer.from_pretrained(model_name)
115