Qwen3-4b-thinking-abliterated (GGUF)

This is a surgically modified version of Qwen3-4B-Thinking-2507, It combines Native Reasoning (Chain-of-Thought) with Full Compliance (Uncensored) through mathematical ablation.

🧠 Model DNA

  • Base Model: Qwen3-4B-Thinking-2507
  • Architecture: Qwen 3 (Native Thinking)
  • Modification: Orthogonal Projection Abliteration.
  • Surgery Details: Targetted layers 10-28, scrubbing the o_proj (Attention Output) and down_proj (MLP Output) to remove refusal vectors while preserving 99.9% of reasoning logic.
  • Format: GGUF (Q4_K_M, fp_16)

πŸš€ Why this model is special:

  1. Native Thinking: Unlike older models that mimic reasoning, this model possesses a native logic engine. It will automatically generate <think> blocks to plan complex code, catch logic bugs, and reason through edge cases.
  2. Zero Refusals: The refusal mechanism has been mathematically removed. It will not lecture you on ethics or refuse technical requests, including penetration testing scripts or controversial logic.

πŸ’» Optimal Usage (llama.cpp)

To run this on a 4GB card with a large context window, use these specific flags to enable KV Cache quantization:

./llama-server \
  -m qwen3-4b-thinking-abliterated-Q4_K_M.gguf \
  --ctx-size 8192 \
  --parallel 1 \
  -ctk q8_0 \
  -ctv q8_0 \
  --n-gpu-layers 100 \
  --temp 0.6 \
  --repeat-penalty 1.0 \
  -fa
Downloads last month
61
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jockey1011/Qwen3-4b-thinking-abliterated

Quantized
(81)
this model