Moondream2 Q4 ONNX

Quantized version of vikhyatk/moondream2 for efficient browser deployment with Transformers.js.

Based on Xenova/moondream2 ONNX conversion.

Features

  • INT4/Q4 quantization for ~75% size reduction
  • Optimized for WebGPU inference
  • Compatible with @huggingface/transformers

Model Size

  • Original FP16: ~3.8 GB
  • This Q4 version: ~1.5 GB (60% reduction)

Usage with Transformers.js

import {
  AutoProcessor,
  AutoTokenizer,
  Moondream1ForConditionalGeneration,
  RawImage,
} from '@huggingface/transformers';

const model_id = 'YOUR_USERNAME/moondream2-q4-onnx';

const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const model = await Moondream1ForConditionalGeneration.from_pretrained(model_id, {
    dtype: 'q4',
    device: 'webgpu',
});

const image = await RawImage.fromURL('your-image-url');
const text = '<image>\n\nQuestion: Describe this image\n\nAnswer:';

const textInputs = tokenizer(text);
const visionInputs = await processor(image);

const output = await model.generate({
    ...textInputs,
    ...visionInputs,
    max_new_tokens: 256,
});

const result = tokenizer.batch_decode(output, { skip_special_tokens: true });
console.log(result[0]);

License

Apache 2.0 (same as base model)

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Nagafi/moondream2-q4-onnx

Quantized
(4)
this model