AWQ 4bit quantization of SicariusSicarii's Impish_Bloodmoon_12B

Quantized on a single Nvidia RTX 4090.

Recipe:

from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor import oneshot
from llmcompressor.modifiers.awq import AWQModifier

dataset = "gsm8k"
model_id = "/path/to/model/"
SAVE_DIR = "/save/dir/"
MAX_SEQUENCE_LENGTH = 1024
NUM_CALIBRATION_SAMPLES = 128

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
)

recipe = [
    AWQModifier(
        targets=["Linear"], 
        scheme="W4A16_ASYM", 
        ignore=["lm_head"],
    )
]

oneshot(
    model=model_id,
    dataset=dataset,
    dataset_config_name="main",
    recipe=recipe,
    output_dir=SAVE_DIR,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)
Downloads last month
32
Safetensors
Model size
3B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for isola-tropicale/Impish_Bloodmoon_12B-AWQ-4bit