---
language: en
license: apache-2.0
tags:
- mlx
- quantized
- 8-bit
- diffusion
- wedlm
library_name: mlx
pipeline_tag: text-generation
base_model: tencent/WeDLM-8B-Instruct
model_type: wedlm
quantized_by: zimengxiong
model_relations:
- type: quantization
  base_model_id: tencent/WeDLM-8B-Instruct
---

# WeDLM-8B-Instruct-MLX-8bit

This is an 8-bit quantized [MLX](https://github.com/ml-explore/mlx) version of [tencent/WeDLM-8B-Instruct](https://huggingface.co/tencent/WeDLM-8B-Instruct) for efficient inference on Apple Silicon.

It currently does not work too well or provide meaningfull speedup due to lack of pre compilation. 
https://github.com/ZimengXiong/WeDLM-MLX/tree/main

## Related Models

| Variant | HuggingFace |
|---------|-------------|
| 4-bit | [zimengxiong/WeDLM-8B-Instruct-MLX-4bit](https://huggingface.co/zimengxiong/WeDLM-8B-Instruct-MLX-4bit) |
| 8-bit (this model) | [zimengxiong/WeDLM-8B-Instruct-MLX-8bit](https://huggingface.co/zimengxiong/WeDLM-8B-Instruct-MLX-8bit) |
| fp16 | [zimengxiong/WeDLM-8B-Instruct-MLX](https://huggingface.co/zimengxiong/WeDLM-8B-Instruct-MLX) |

## License

This model inherits the license from the base model [tencent/WeDLM-8B-Instruct](https://huggingface.co/tencent/WeDLM-8B-Instruct).