RL LLM AGENT

community

https://www.sanjibanchoudhury.com/

AI & ML interests

None defined yet.

rl-llm-agent 's models 13

rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-leap-iter1

Text Generation • 3B • Updated Feb 12, 2025 • 1

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter1

Updated Jan 20, 2025

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50

Updated Jan 16, 2025 • 1

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iter2-70k

Updated Jan 16, 2025

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-shaped-iter0

Updated Jan 14, 2025

rl-llm-agent/Llama-3.2-3B-Instruct-value-alfworld-8b-sft

Updated Jan 13, 2025

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iqlearn-iter0

Updated Jan 13, 2025

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter0

Updated Jan 13, 2025

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter2

Updated Jan 11, 2025

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1

Text Generation • 3B • Updated Jan 10, 2025 • 1

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter0

Updated Jan 8, 2025

rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-iter0

Text Generation • 3B • Updated Jan 4, 2025

rl-llm-agent/Llama-3.1-8B-Instruct-sft-alfworld-iter0

Text Generation • 8B • Updated Jan 3, 2025