RL LLM AGENT

community

https://www.sanjibanchoudhury.com/

AI & ML interests

None defined yet.

rl-llm-agent 's models 13

rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-leap-iter1

Text Generation • 3B • Updated Feb 12 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter1

Updated Jan 20 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50

Updated Jan 16 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iter2-70k

Updated Jan 16 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-shaped-iter0

Updated Jan 14 • 3

rl-llm-agent/Llama-3.2-3B-Instruct-value-alfworld-8b-sft

Updated Jan 13 • 3

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iqlearn-iter0

Updated Jan 13 • 3

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter0

Updated Jan 13 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter2

Updated Jan 11 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1

Text Generation • 3B • Updated Jan 10 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter0

Updated Jan 8 • 2

rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-iter0

Text Generation • 3B • Updated Jan 4 • 11

rl-llm-agent/Llama-3.1-8B-Instruct-sft-alfworld-iter0

Text Generation • 8B • Updated Jan 3 • 2