DrRiceIO7
/

HereticFT-Aggressive

Text Generation

text-generation-inference

Model card Files Files and versions

Originally, I wanted to try fine tuning my model with DPO but I couldn't figure out how to get Unsloth to do it using Gemma based models, so this is based on regular old SFT. It still got that abrasive edge though, so I'm calling it a partial success, on account of it seeming a little bit unstable. Next plan: try out a new architecture.

Uploaded finetuned model

Developed by: DrRiceIO7
License: apache-2.0
Finetuned from model : DrRiceIO7/HereticFT

This gemma3 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 717

Safetensors

Model size

4B params

Tensor type

BF16

·

Model tree for DrRiceIO7/HereticFT-Aggressive

Base model

DrRiceIO7/mergedheretic

Finetuned

DrRiceIO7/heretic-checkpoint

Finetuned

DrRiceIO7/HereticFT

Finetuned

(2)

this model

Finetunes

1 model

Dataset used to train DrRiceIO7/HereticFT-Aggressive