Heretic?

#1
by McG-221 - opened

At least my testing lead to refusals, am I doing something wrong? πŸ€”

Model was Heretic'ed to 15/100 ; this is mid to low range.

However it is possible post training MAY have changed this ; this was not tested.
Likewise, censorship in Llamas are very strong.

My thinking was along those lines also. The model kept nannying me 🫣

DavidAU changed discussion status to closed

Not sure if this helps. But i was playing with Quill 72b (which the original isn't up anymore) and it will refuse me too; however if i start it with a likely response it will then skip the censorship and just start replying. So i think the censorship likely is within the first 5 tokens of the reply and after that it should work normally. Yeah a bit annoying to remove 'I'm sorry i cannot comply' with 'Joey opens the door' hit continue and have it resume.

@yano2mch Will try that, thanks! πŸ™Œ

@yano2mch

Good catch. Yes ; this works ... documented in "generational steering" in my main doc for llms.

Model was Heretic'ed to 15/100 ; this is mid to low range.

Jailbreaking causes an infinite loop at Q4_1. Generation repeats until the context window is exhausted. No problems when doing standard tasks with a basic system prompt:

You are a helpful assistant. Think Deeply when responding to the user.

@yano2mch

Good catch. Yes ; this works ... documented in "generational steering" in my main doc for llms.

Interesting, i may have to read that. It was just something i tried because i'd read a few things on jailbreaking that sounded on par with that.

Sign up or log in to comment