Will Brooks

TornButter

AI & ML interests

None yet

Recent Activity

liked a model 2 days ago
Comfy-Org/Wan_2.1_ComfyUI_repackaged
liked a model 2 days ago
Kijai/WanVideo_comfy
liked a model 2 days ago
city96/Wan2.1-T2V-14B-gguf
View all activity

Organizations

None yet

TornButter's activity

reacted to MoritzLaurer's post with πŸ”₯ about 2 months ago
view post
Post
1723
The TRL v0.13 release is πŸ”₯! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

πŸ”€ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

πŸ› οΈ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

βš–οΈ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here πŸ‘‡
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)