LFM2-700M-GRPO-NuminaMath-10K-GGUF

GGUF quantized versions of LFM2-700M-GRPO-NuminaMath-10K for efficient CPU and mixed CPU/GPU inference.

Model Overview

This is a quantized version of LFM2-700M-GRPO-NuminaMath-10K, a 700M parameter model fine-tuned using Group Relative Policy Optimization (GRPO) on the NuminaMath-CoT dataset for mathematical reasoning tasks.

Key Features

Mathematical Reasoning: Optimized for step-by-step math problem solving
GRPO Training: Uses reinforcement learning with verifiable rewards
Efficient Inference: Quantized for fast CPU/GPU inference
Wide Compatibility: Works with Ollama, llama.cpp, LM Studio, and more

Available Quantizations

Quantization	File	Size	Description
Q4_K_M	`lfm2-700m-grpo-numina-10k-q4_k_m.gguf`	~40% of original	Best balance of quality and size

Quick Start

Using Ollama

# Pull and run directly from HuggingFace
ollama pull hf.co/ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF:Q4_K_M "Solve step by step: What is 15% of 80?"

Alternative: Create Custom Modelfile

# Download the GGUF file first
huggingface-cli download ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF \
    lfm2-700m-grpo-numina-10k-q4_k_m.gguf --local-dir ./models

# Create Modelfile with custom system prompt
cat > Modelfile << 'EOF'
FROM ./models/lfm2-700m-grpo-numina-10k-q4_k_m.gguf

SYSTEM "You are a helpful math tutor. When given a math problem, solve it step by step, showing your reasoning clearly. Always verify your final answer."

PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

# Create and run the model
ollama create lfm2-700m-grpo-numina-10k -f Modelfile
ollama run lfm2-700m-grpo-numina-10k

Using llama.cpp

# Download the GGUF file
huggingface-cli download ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF \
    lfm2-700m-grpo-numina-10k-q4_k_m.gguf --local-dir ./models

# Run inference
./llama-cli -m ./models/lfm2-700m-grpo-numina-10k-q4_k_m.gguf \
    -p "Solve step by step: If a train travels at 60 mph for 2.5 hours, how far does it travel?" \
    -n 256

# Or start a server
./llama-server -m ./models/lfm2-700m-grpo-numina-10k-q4_k_m.gguf \
    --host 0.0.0.0 --port 8080

Using llama-cpp-python

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./models/lfm2-700m-grpo-numina-10k-q4_k_m.gguf",
    n_ctx=2048,
    n_gpu_layers=-1  # Use all GPU layers if available
)

# Generate response
prompt = '''Solve step by step:
A store has a 25% off sale. If an item originally costs $80, what is the sale price?

Solution:'''

output = llm(
    prompt,
    max_tokens=256,
    temperature=0.7,
    top_p=0.9,
    echo=False
)

print(output['choices'][0]['text'])

Using LM Studio

Download the GGUF file from this repository
Open LM Studio and navigate to the Models tab
Click "Import Model" and select the downloaded GGUF file
Load the model and start chatting about math problems!

Example Prompts

Here are some example prompts that work well with this model:

Solve step by step: What is 23 × 17?

Solve step by step: A rectangle has a length of 12 cm and a width of 8 cm. What is its area and perimeter?

Solve step by step: If 3x + 7 = 22, what is the value of x?

Solve step by step: A car travels 150 miles in 2.5 hours. What is its average speed in miles per hour?

Source Model

This is a quantized version of LFM2-700M-GRPO-NuminaMath-10K.

Training Details

Property	Value
Base Model	LiquidAI/LFM2-700M
Training Method	GRPO (Group Relative Policy Optimization)
Dataset	AI-MO/NuminaMath-CoT
Training Samples	10,000
LoRA Rank	16
LoRA Alpha	32

See the source model card for full training details and usage examples with Transformers.

Hardware Requirements

Quantization	RAM Required	GPU VRAM (optional)
Q4_K_M	~1-2 GB	~1-2 GB

Conversion Details

Property	Value
Source Model	ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K
Conversion Date	2025-12-29
Quantization	Q4_K_M
Converter	llama.cpp

License

CC-BY-NC-4.0 (same as source model)

Acknowledgments

Liquid AI for the LFM2 base model
AI-MO for the NuminaMath-CoT dataset
llama.cpp for quantization tools
ermiaazarkhalili for training and quantization

Quantized using the HF-TRL GGUF conversion pipeline on Compute Canada infrastructure

Downloads last month: 32

GGUF

Model size

0.7B params

Architecture

lfm2

Hardware compatibility

4-bit

Model tree for ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF

Base model

LiquidAI/LFM2-700M

Adapter

ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K

Quantized

(1)

this model

ermiaazarkhalili
/

LFM2-700M-GRPO-NuminaMath-10K-GGUF