LLaMA-3.2-1B-Instruct Post-training by GRPO from DeepSeek

This model is a post-trained version of LLaMA-3.2-1B-Instruct.

Model Details

  • Base Model: LLaMA-3.2-1B
  • Training Data: openai/gsm8k
  • Post-training Steps: 1000
  • Checkpoint: checkpoint-1000/
  • Framework: Hugging Face transformers
  • Usage: Mathematical Reasoning.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "accuracy-maker/Llama-3.2-1B-GRPO-gsm8k"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "What is the capital of France?"
generate_with_stream(input_text)
Downloads last month
0
Safetensors
Model size
1.24B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.