See Solar-Open-100B MLX in action - demonstration video
q6.5bit mixed quant typically achieves 1.128 perplexity in our testing
| Quantization | Perplexity |
|---|---|
| q2.5 | 41.293 |
| q3.5 | 1.900 |
| q4.5 | 1.168 |
| q4.8 | 1.140 |
| q5.5 | 1.141 |
| q6.5 | 1.128 |
| q8.5 | 1.128 |
Usage Notes
Tested on a M3 Ultra using Inferencer app v1.9.1
- Single inference ~45 tokens/s @ 1000 tokens
- Batched inference ~72 total tokens/s across four inferences
- Memory usage: ~78 GB
Quantized with a modified version of MLX 0.30
For more details see demonstration video or visit Solar-Open-100B.
- Downloads last month
- 138