Originally, I wanted to try fine tuning my model with DPO but I couldn't figure out how to get Unsloth to do it using Gemma based models, so this is based on regular old SFT. It still got that abrasive edge though, so I'm calling it a partial success, on account of it seeming a little bit unstable. Next plan: try out a new architecture.
Uploaded finetuned model
- Developed by: DrRiceIO7
- License: apache-2.0
- Finetuned from model : DrRiceIO7/HereticFT
This gemma3 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 717
Model tree for DrRiceIO7/HereticFT-Aggressive
Base model
DrRiceIO7/mergedheretic
Finetuned
DrRiceIO7/heretic-checkpoint
Finetuned
DrRiceIO7/HereticFT
