Why Q8 is just 13 GiB ?
#36
by
zvwgvx
- opened
Why Q8 is just 13 GiB ?
Why Q8 is just 13 GiB ?
The original model is already very small
thanks, before I didn't know that the model is mixed quantization between MXFP4 and BF16 so I calculated it out not match