Heretic?

by McG-221 - opened 3 days ago

Discussion

McG-221

3 days ago

At least my testing lead to refusals, am I doing something wrong? 🤔

DavidAU

Owner 3 days ago

Model was Heretic'ed to 15/100 ; this is mid to low range.

However it is possible post training MAY have changed this ; this was not tested.
Likewise, censorship in Llamas are very strong.

McG-221

3 days ago

My thinking was along those lines also. The model kept nannying me 🫣

DavidAU changed discussion status to closed 3 days ago

yano2mch

2 days ago

•

edited 2 days ago

Not sure if this helps. But i was playing with Quill 72b (which the original isn't up anymore) and it will refuse me too; however if i start it with a likely response it will then skip the censorship and just start replying. So i think the censorship likely is within the first 5 tokens of the reply and after that it should work normally. Yeah a bit annoying to remove 'I'm sorry i cannot comply' with 'Joey opens the door' hit continue and have it resume.

McG-221

2 days ago

@yano2mch Will try that, thanks! 🙌

DavidAU

Owner 1 day ago

@yano2mch

Good catch. Yes ; this works ... documented in "generational steering" in my main doc for llms.

redaihf

1 day ago

•

edited 1 day ago

Model was Heretic'ed to 15/100 ; this is mid to low range.

Jailbreaking causes an infinite loop at Q4_1. Generation repeats until the context window is exhausted. No problems when doing standard tasks with a basic system prompt:

You are a helpful assistant. Think Deeply when responding to the user.

yano2mch

about 17 hours ago

@yano2mch

Good catch. Yes ; this works ... documented in "generational steering" in my main doc for llms.

Interesting, i may have to read that. It was just something i tried because i'd read a few things on jailbreaking that sounded on par with that.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment