Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 4 days ago • 45
ringos/output_Llama-3.1-8B-simpleqa-0_1000-m_generation-n_128-t_1.0-k_50-p_0.95-l_128 Updated Dec 25, 2024 • 46
ringos/output_Llama-3.1-8B-simpleqa-0_-1-m_generation-n_128-t_1.0-k_50-p_0.95-l_128 Updated Dec 17, 2024 • 42
ringos/output_Mistral-Nemo-Base-2407-simpleqa-0_1000-m_generation-n_32-t_1.0-k_40-p_0.9-l_128 Viewer • Updated Dec 2, 2024 • 216 • 44
ringos/bio-detailed-Llama-3.1-8B-gemma2-rm-gold_True-n_32 Viewer • Updated Nov 13, 2024 • 371 • 47
ringos/bio-detailed-Llama-3.1-8B-gemma2-rm-gold_True-n_32 Viewer • Updated Nov 13, 2024 • 371 • 47
ringos/ultrafeedback_binarized-vanilla-filtered_as_Llama_n32 Viewer • Updated Nov 13, 2024 • 58.2k • 39