Reinforced Token Optimization

AI & ML interests

None defined yet.

Recent Activity

zkshan2002 published a model about 1 month ago

RTO-RL/Llama3-8B-TDPO

zkshan2002 updated a model about 1 month ago

RTO-RL/Llama3-8B-TDPO

zkshan2002 published a model about 1 month ago

RTO-RL/Llama3-8B-SimPO

View all activity

models 8

RTO-RL/Llama3-8B-TDPO

Updated Feb 11 • 14 • 1

RTO-RL/Llama3-8B-SimPO

Updated Feb 11 • 16

RTO-RL/Llama3-8B-RDPO

Updated Feb 11 • 16 • 1

RTO-RL/Llama3-8B-PPO

Updated Feb 11 • 12 • 1

RTO-RL/Llama3-8B-RTO

Updated Feb 11 • 23 • 1

RTO-RL/Llama3.2-1B-RewardModel

Updated Feb 11 • 25

RTO-RL/Llama3-8B-RewardModel

Updated Feb 11 • 7

RTO-RL/Llama3-8B-DPO

Updated Feb 11 • 18

datasets

None public yet