AI & ML interests
None defined yet.
rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-leap-iter1
Text Generation
•
3B
•
Updated
•
1
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter1
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iter2-70k
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-shaped-iter0
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-value-alfworld-8b-sft
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iqlearn-iter0
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter0
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter2
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1
Text Generation
•
3B
•
Updated
•
1
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter0
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-iter0
Text Generation
•
3B
•
Updated
rl-llm-agent/Llama-3.1-8B-Instruct-sft-alfworld-iter0
Text Generation
•
8B
•
Updated