Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Models filters
Tasks
Libraries
Datasets
Languages
Licenses
Other
1
Inference status
Reset Inference status
Warm
Cold
Frozen
Misc
Reset Misc
reward-trainer
Inference Endpoints
AutoTrain Compatible
text-generation-inference
4-bit precision
Eval Results
8-bit precision
Misc with no match
Merge
custom_code
text-embeddings-inference
Carbon Emissions
Mixture of Experts
Apply filters
Models
321
Full-text search
Edit filters
Sort: Trending
Active filters:
reward-trainer
Clear all
JayHyeon/Qwen2-0.5B-Reward_VPO_1e-4
Text Classification
•
Updated
28 days ago
•
16
JayHyeon/Qwen2-0.5B-Reward_5e-4
Text Classification
•
Updated
28 days ago
•
13
JayHyeon/Qwen2-0.5B-Reward_VPO_5e-4
Text Classification
•
Updated
28 days ago
•
17
JayHyeon/Qwen2-0.5B-Reward_1e-3
Text Classification
•
Updated
28 days ago
•
31
JayHyeon/Qwen2-0.5B-Reward_VPO_1e-3
Text Classification
•
Updated
28 days ago
•
29
JayHyeon/Qwen2-0.5B-Reward_5e-3
Text Classification
•
Updated
28 days ago
•
29
JayHyeon/Qwen2-0.5B-Reward_VPO_5e-3
Text Classification
•
Updated
28 days ago
•
15
bikalnetomi/rlhf-ppo-llama31-8B-Reward-model-lora-r64-bikal
Updated
27 days ago
bikalnetomi/rlhf-ppo-llama31-8B-Reward-model-lora-r128-bikal
Updated
27 days ago
bikalnetomi/rlhf-ppo-llama31-8B-Reward-model-lora-r256-bikal
Updated
27 days ago
bikalnetomi/rlhf-ppo-llama32-3B-Reward-model-lora-r64-bikal
Updated
27 days ago
bikalnetomi/rlhf-ppo-llama31-8B-Reward-model-lora-r16-bikal
Updated
27 days ago
bikalnetomi/rlhf-ppo-llama31-8B-Reward-model-lora-r8-bikal
Updated
27 days ago
kmjae/Qwen2.5-0.5B-RM
Text Classification
•
Updated
23 days ago
•
147
bikalnetomi/RLHF-PPO-RewardModel-LLama3-3B-v1
Text Generation
•
Updated
25 days ago
•
16
bikalnetomi/RLHF-PPO-RewardModel-LLama3-3B-v2
Text Classification
•
Updated
25 days ago
•
26
bikalnetomi/RLHF-PPO-RewardModel-LLama3-1B-v1.1
Text Classification
•
Updated
25 days ago
•
15
bikalnetomi/rlhf-ppo-llama3-1B-Reward-model-lora-bikal
Updated
25 days ago
bikalnetomi/RLHF-PPO-RewardModel-LLama3-1B-v2
Updated
25 days ago
bikalnetomi/RLHF-PPO-RewardModel-LLama3-1B-v1
Text Classification
•
Updated
24 days ago
•
84
borisshapa/rm-opt-350m-hs2
Text Generation
•
Updated
24 days ago
•
13
paulovsantanas/reward_model
Text Classification
•
Updated
24 days ago
•
17
HFXM/RM_HHRLHF_Rule0
Text Classification
•
Updated
22 days ago
•
16
HFXM/RM_HHRLHF_Rule4
Text Classification
•
Updated
22 days ago
•
10
HFXM/RM_HHRLHF_Rule9
Text Classification
•
Updated
22 days ago
•
9
HFXM/RM_HHRLHF_Rule5
Text Classification
•
Updated
22 days ago
•
7
HFXM/RM_HHRLHF_Rule7
Text Classification
•
Updated
22 days ago
•
10
HFXM/RM_HHRLHF_Rule6
Text Classification
•
Updated
22 days ago
•
7
HFXM/RM_HHRLHF_Rule8
Text Classification
•
Updated
22 days ago
•
12
HFXM/RM_HHRLHF_Rule3
Text Classification
•
Updated
22 days ago
•
7
Previous
1
...
8
9
10
11
Next