LFM2-700M-GRPO-NuminaMath-10K-GGUF

GGUF quantized versions of LFM2-700M-GRPO-NuminaMath-10K for efficient CPU and mixed CPU/GPU inference.

Model Overview

This is a quantized version of LFM2-700M-GRPO-NuminaMath-10K, a 700M parameter model fine-tuned using Group Relative Policy Optimization (GRPO) on the NuminaMath-CoT dataset for mathematical reasoning tasks.

Key Features

  • Mathematical Reasoning: Optimized for step-by-step math problem solving
  • GRPO Training: Uses reinforcement learning with verifiable rewards
  • Efficient Inference: Quantized for fast CPU/GPU inference
  • Wide Compatibility: Works with Ollama, llama.cpp, LM Studio, and more

Available Quantizations

Quantization File Size Description
Q4_K_M lfm2-700m-grpo-numina-10k-q4_k_m.gguf ~40% of original Best balance of quality and size

Quick Start

Using Ollama

# Pull and run directly from HuggingFace
ollama pull hf.co/ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF:Q4_K_M "Solve step by step: What is 15% of 80?"

Alternative: Create Custom Modelfile

# Download the GGUF file first
huggingface-cli download ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF \
    lfm2-700m-grpo-numina-10k-q4_k_m.gguf --local-dir ./models

# Create Modelfile with custom system prompt
cat > Modelfile << 'EOF'
FROM ./models/lfm2-700m-grpo-numina-10k-q4_k_m.gguf

SYSTEM "You are a helpful math tutor. When given a math problem, solve it step by step, showing your reasoning clearly. Always verify your final answer."

PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

# Create and run the model
ollama create lfm2-700m-grpo-numina-10k -f Modelfile
ollama run lfm2-700m-grpo-numina-10k

Using llama.cpp

# Download the GGUF file
huggingface-cli download ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF \
    lfm2-700m-grpo-numina-10k-q4_k_m.gguf --local-dir ./models

# Run inference
./llama-cli -m ./models/lfm2-700m-grpo-numina-10k-q4_k_m.gguf \
    -p "Solve step by step: If a train travels at 60 mph for 2.5 hours, how far does it travel?" \
    -n 256

# Or start a server
./llama-server -m ./models/lfm2-700m-grpo-numina-10k-q4_k_m.gguf \
    --host 0.0.0.0 --port 8080

Using llama-cpp-python

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./models/lfm2-700m-grpo-numina-10k-q4_k_m.gguf",
    n_ctx=2048,
    n_gpu_layers=-1  # Use all GPU layers if available
)

# Generate response
prompt = '''Solve step by step:
A store has a 25% off sale. If an item originally costs $80, what is the sale price?

Solution:'''

output = llm(
    prompt,
    max_tokens=256,
    temperature=0.7,
    top_p=0.9,
    echo=False
)

print(output['choices'][0]['text'])

Using LM Studio

  1. Download the GGUF file from this repository
  2. Open LM Studio and navigate to the Models tab
  3. Click "Import Model" and select the downloaded GGUF file
  4. Load the model and start chatting about math problems!

Example Prompts

Here are some example prompts that work well with this model:

Solve step by step: What is 23 × 17?

Solve step by step: A rectangle has a length of 12 cm and a width of 8 cm. What is its area and perimeter?

Solve step by step: If 3x + 7 = 22, what is the value of x?

Solve step by step: A car travels 150 miles in 2.5 hours. What is its average speed in miles per hour?

Source Model

This is a quantized version of LFM2-700M-GRPO-NuminaMath-10K.

Training Details

Property Value
Base Model LiquidAI/LFM2-700M
Training Method GRPO (Group Relative Policy Optimization)
Dataset AI-MO/NuminaMath-CoT
Training Samples 10,000
LoRA Rank 16
LoRA Alpha 32

See the source model card for full training details and usage examples with Transformers.

Hardware Requirements

Quantization RAM Required GPU VRAM (optional)
Q4_K_M ~1-2 GB ~1-2 GB

Conversion Details

Property Value
Source Model ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K
Conversion Date 2025-12-29
Quantization Q4_K_M
Converter llama.cpp

License

CC-BY-NC-4.0 (same as source model)

Acknowledgments


Quantized using the HF-TRL GGUF conversion pipeline on Compute Canada infrastructure

Downloads last month
32
GGUF
Model size
0.7B params
Architecture
lfm2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF

Base model

LiquidAI/LFM2-700M
Quantized
(1)
this model

Dataset used to train ermiaazarkhalili/LFM2-700M-GRPO-NuminaMath-10K-GGUF