Q7 quantization
Hi @DavidAU
Why does no one supply Q7 quantizations ?
It could be really useful for people who can almost fit a Q8 or who can fit a Q8, but no VRAM for context left.
I don't know if you remember when we discussed that although Q8s and Q6s are a bit smarter than Q4s and Q5s, the Q4s and Q5s are a bit more creative than the Q8s and Q6s...
Is there a technical problem for making Q7s or why are people not making them ?
Then again, are there such things as Q9s, Q10s, etc possible ?
There is no "native" Q7 mix in Llamacpp.
You can set tensors to different bit levels, likewise for output tensor/embed too -> to make a "quasi" Q7.
As for Q9/Q10 -> I can see this would help as sometimes a few more bits can make a diff depending on use cases.
But with the option to set bits for tensors from IQ1 to F16/BF16 and F32 ...