Q7 quantization

by SzilviaB - opened 2 days ago

2 days ago

Why does no one supply Q7 quantizations ?

It could be really useful for people who can almost fit a Q8 or who can fit a Q8, but no VRAM for context left.

I don't know if you remember when we discussed that although Q8s and Q6s are a bit smarter than Q4s and Q5s, the Q4s and Q5s are a bit more creative than the Q8s and Q6s...

Is there a technical problem for making Q7s or why are people not making them ?

Then again, are there such things as Q9s, Q10s, etc possible ?

DavidAU

Owner 1 day ago

There is no "native" Q7 mix in Llamacpp.
You can set tensors to different bit levels, likewise for output tensor/embed too -> to make a "quasi" Q7.

As for Q9/Q10 -> I can see this would help as sometimes a few more bits can make a diff depending on use cases.
But with the option to set bits for tensors from IQ1 to F16/BF16 and F32 ...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment