Qwen3-1.7B Medical Fine-tuned v3 (GGUF)

This is a GGUF-quantized version of bisonnetworking/bison-medical-v3-1.7b, a medical fine-tuned variant of Qwen/Qwen3-1.7B.

Model Details

Base Model: Qwen/Qwen3-1.7B
Fine-tuned Model: bisonnetworking/bison-medical-v3-1.7b
Format: GGUF (llama.cpp compatible)
Use Cases: Medical question answering, medical information retrieval

Available Quantizations

Quantization	File Size	Use Case
Q8_0	~1.8 GB	High quality, good for GPU
Q6_K	~1.4 GB	Very good quality, balanced
Q4_K_M	~1.0 GB	Good quality/size tradeoff, optimized for CPU

Usage

With llama.cpp

# Download a quantized version
huggingface-cli download bisonnetworking/bison-medical-v3-1.7b-gguf model-Q4_K_M.gguf --local-dir ./models

# Run inference
./llama.cpp/llama-cli -m ./models/model-Q4_K_M.gguf -p "What are the symptoms of diabetes?"

With Ollama

# Create Modelfile
cat > Modelfile <<EOF
FROM ./models/model-Q4_K_M.gguf
SYSTEM "You are a medical expert assistant. Provide accurate, evidence-based medical information."
EOF

# Create model
ollama create bison-medical-v3 -f Modelfile

# Run
ollama run bison-medical-v3 "What are the symptoms of diabetes?"

With LM Studio

Download any .gguf file from this repository
Open LM Studio
Load the model from the downloaded file
Start chatting!

Training Details

See bisonnetworking/bison-medical-v3-1.7b for full training details.

License

Apache 2.0 (same as base model)

Downloads last month: 78

GGUF

Model size

2B params

Architecture

qwen3

Hardware compatibility

4-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bisonnetworking/bison-medical-v3-1.7b-gguf

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Quantized

(138)

this model