3 16 115

Mohammad Shojaei

mshojaei77

AI & ML interests

None yet

Recent Activity

updated a dataset about 2 hours ago

mshojaei77/Persian_sft_jsonl

published a dataset about 2 hours ago

mshojaei77/Persian_sft_jsonl

reacted to etemiz's post with 👍 about 2 hours ago

Benchmarked Gemma 3 today. It has better knowledge compared to 2 but still in the median area in the leaderboard.

View all activity

Organizations

mshojaei77's activity

updated a dataset about 2 hours ago

mshojaei77/Persian_sft_jsonl

Updated about 2 hours ago

published a dataset about 2 hours ago

mshojaei77/Persian_sft_jsonl

Updated about 2 hours ago

reacted to etemiz's post with 👍 about 2 hours ago

Post

315

Benchmarked Gemma 3 today. It has better knowledge compared to 2 but still in the median area in the leaderboard.

1 reply

liked a dataset about 2 hours ago

mshojaei77/PersianTelegramChannels

Viewer • Updated 10 days ago • 12.1k • 119 • 4

reacted to burtenshaw's post with 🤗 about 3 hours ago

Post

297

everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft

plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly

2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)

5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)

if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

https://huggingface.co/reasoning-course