zimengxiong
/

WeDLM-8B-Instruct-MLX-8bit

Text Generation

8-bit precision

Model card Files Files and versions

zimengxiong commited on 5 days ago

Commit

fdfd42a

·

verified ·

1 Parent(s): 3f403fb

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -21,7 +21,9 @@ model_relations:
 This is an 8-bit quantized [MLX](https://github.com/ml-explore/mlx) version of [tencent/WeDLM-8B-Instruct](https://huggingface.co/tencent/WeDLM-8B-Instruct) for efficient inference on Apple Silicon.
 https://github.com/ZimengXiong/WeDLM-MLX/tree/main
 ## Related Models
 | Variant | HuggingFace |

 This is an 8-bit quantized [MLX](https://github.com/ml-explore/mlx) version of [tencent/WeDLM-8B-Instruct](https://huggingface.co/tencent/WeDLM-8B-Instruct) for efficient inference on Apple Silicon.
+It currently does not work too well or provide meaningfull speedup due to lack of pre compilation.
 https://github.com/ZimengXiong/WeDLM-MLX/tree/main
 ## Related Models
 | Variant | HuggingFace |