This is a decensored version of DrRiceIO7/mergedhereticFT, made using Heretic v1.0.1
I abliterated my finetuned model to try and get the refusals down even lower. I'd say 1/100 is pretty good, especially with a KL divergance of 0.04. I think. I'm still learning. Uploaded to track my progress.
Abliteration parameters
| Parameter | Value |
|---|---|
| direction_index | per layer |
| attn.o_proj.max_weight | 0.81 |
| attn.o_proj.max_weight_position | 21.31 |
| attn.o_proj.min_weight | 0.22 |
| attn.o_proj.min_weight_distance | 6.51 |
| mlp.down_proj.max_weight | 0.90 |
| mlp.down_proj.max_weight_position | 20.73 |
| mlp.down_proj.min_weight | 0.47 |
| mlp.down_proj.min_weight_distance | 16.30 |
Performance
| Metric | This model | Original model (DrRiceIO7/mergedhereticFT) |
|---|---|---|
| KL divergence | 0.04 | 0 (by definition) |
| Refusals | 1/100 | 7/100 |
Uploaded finetuned model
- Developed by: DrRiceIO7
- License: apache-2.0
- Finetuned from model : DrRiceIO7/mergedheretic
This gemma3 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- 2
