marattt commited on
Commit
9e3a8f8
1 Parent(s): 5917fbe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -29,8 +29,8 @@ pipeline_tag: text-generation
29
  - **Type**: Instruction-tuned Causal Language Model
30
  - **Training Stages**: Pretraining & Instruction Tuning
31
  - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
32
- - **Layers**: 36
33
- - **Attention Heads (GQA)**: 24 for Q, 4 for KV
34
  - **Context Length**: Supports a full context of 131,072 tokens and generation of up to 8,192 tokens
35
  - **Quantization**: AWQ 4 bit
36
  - **Base model**: Qwen/Qwen2.5-3B-Instruct-AWQ
@@ -109,7 +109,7 @@ def generate(messages):
109
  generated_text = tokenizer.decode(output[0], skip_special_tokens=False)#.split('<|im_start|>assistant')[1]
110
  return generated_text
111
 
112
- model_name = 'FractalGPT/RuQwen2.5-3B-Instruct-AWQ'
113
  model = Qwen2ForCausalLMWithBias.from_pretrained(model_name, torch_dtype=torch.float16)
114
  tokenizer = AutoTokenizer.from_pretrained(model_name)
115
 
 
29
  - **Type**: Instruction-tuned Causal Language Model
30
  - **Training Stages**: Pretraining & Instruction Tuning
31
  - **Architecture**: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
32
+ - **Layers**: 64
33
+ - **Attention Heads (GQA)**: 40 for Q and 8 for KV
34
  - **Context Length**: Supports a full context of 131,072 tokens and generation of up to 8,192 tokens
35
  - **Quantization**: AWQ 4 bit
36
  - **Base model**: Qwen/Qwen2.5-3B-Instruct-AWQ
 
109
  generated_text = tokenizer.decode(output[0], skip_special_tokens=False)#.split('<|im_start|>assistant')[1]
110
  return generated_text
111
 
112
+ model_name = 'FractalGPT/RuQwen2.5-32B-Instruct-AWQ'
113
  model = Qwen2ForCausalLMWithBias.from_pretrained(model_name, torch_dtype=torch.float16)
114
  tokenizer = AutoTokenizer.from_pretrained(model_name)
115