LIMO: Less is More for Reasoning
Paper
β’
2502.03387
β’
Published
β’
62
This is the updated version (v2) of the LIMO model, corresponding to the latest paper version as of July 30, 2025.
| Model | Backbone | Size |
|---|---|---|
| LIMO-v2 | Qwen2.5-32B-Instruct | 32B |
If you need the original LIMO model (corresponding to the initial paper version), you can access it at:
GAIR/LIMOOur model is fine-tuned on Qwen2.5-32B-Instruct and is compatible with most mainstream frameworks like HF Transformers, VLLM, TensorRT-LLM and etc.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Initialize model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"GAIR/LIMO-v2",
torch_dtype="auto",
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True)
# Prepare input messages
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": "What is the result of 1+1?"}
]
# Format input using chat template
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Tokenize input
inputs = tokenizer(text, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
**inputs,
max_new_tokens=32768,
temperature=0.7,
top_p=0.95,
do_sample=True
)
# Decode and print response
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
# Initialize the model
llm = LLM(
model="GAIR/LIMO-v2",
tensor_parallel_size=4, # adjust based on available GPUs
trust_remote_code=True,
swap_space=60,
gpu_memory_utilization=0.96,
)
# Prepare input messages
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": "What is the result of 1+1?"}
]
# Setup tokenizer
tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Configure generation parameters
sampling_params = SamplingParams(
temperature=0.7,
max_tokens=32768,
top_p=0.95,
)
# Generate response
output = llm.generate(text, sampling_params)
print(output[0].outputs[0].text)
@misc{ye2025limoreasoning,
title={LIMO: Less is More for Reasoning},
author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu},
year={2025},
eprint={2502.03387},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.03387},
}
For more details and training code, please visit our GitHub repository.