Qwen3-4b-thinking-abliterated (GGUF)

This is a surgically modified version of Qwen3-4B-Thinking-2507, It combines Native Reasoning (Chain-of-Thought) with Full Compliance (Uncensored) through mathematical ablation.

🧠 Model DNA

Base Model: Qwen3-4B-Thinking-2507
Architecture: Qwen 3 (Native Thinking)
Modification: Orthogonal Projection Abliteration.
Surgery Details: Targetted layers 10-28, scrubbing the o_proj (Attention Output) and down_proj (MLP Output) to remove refusal vectors while preserving 99.9% of reasoning logic.
Format: GGUF (Q4_K_M, fp_16)

🚀 Why this model is special:

Native Thinking: Unlike older models that mimic reasoning, this model possesses a native logic engine. It will automatically generate <think> blocks to plan complex code, catch logic bugs, and reason through edge cases.
Zero Refusals: The refusal mechanism has been mathematically removed. It will not lecture you on ethics or refuse technical requests, including penetration testing scripts or controversial logic.

💻 Optimal Usage (llama.cpp)

To run this on a 4GB card with a large context window, use these specific flags to enable KV Cache quantization:

./llama-server \
  -m qwen3-4b-thinking-abliterated-Q4_K_M.gguf \
  --ctx-size 8192 \
  --parallel 1 \
  -ctk q8_0 \
  -ctv q8_0 \
  --n-gpu-layers 100 \
  --temp 0.6 \
  --repeat-penalty 1.0 \
  -fa

Downloads last month: 61

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jockey1011/Qwen3-4b-thinking-abliterated

Base model

Qwen/Qwen3-4B-Thinking-2507

Quantized

(81)

this model