Moondream2 Q4 ONNX
Quantized version of vikhyatk/moondream2 for efficient browser deployment with Transformers.js.
Based on Xenova/moondream2 ONNX conversion.
Features
- INT4/Q4 quantization for ~75% size reduction
- Optimized for WebGPU inference
- Compatible with @huggingface/transformers
Model Size
- Original FP16: ~3.8 GB
- This Q4 version: ~1.5 GB (60% reduction)
Usage with Transformers.js
import {
AutoProcessor,
AutoTokenizer,
Moondream1ForConditionalGeneration,
RawImage,
} from '@huggingface/transformers';
const model_id = 'YOUR_USERNAME/moondream2-q4-onnx';
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const model = await Moondream1ForConditionalGeneration.from_pretrained(model_id, {
dtype: 'q4',
device: 'webgpu',
});
const image = await RawImage.fromURL('your-image-url');
const text = '<image>\n\nQuestion: Describe this image\n\nAnswer:';
const textInputs = tokenizer(text);
const visionInputs = await processor(image);
const output = await model.generate({
...textInputs,
...visionInputs,
max_new_tokens: 256,
});
const result = tokenizer.batch_decode(output, { skip_special_tokens: true });
console.log(result[0]);
License
Apache 2.0 (same as base model)
- Downloads last month
- 11
Model tree for Nagafi/moondream2-q4-onnx
Base model
vikhyatk/moondream2