Add model-index with reasoning benchmark evaluations

#13

by davidlms - opened 29 days ago

base: refs/heads/main

←

from: refs/pr/13

Discussion Files changed

+29

-1

davidlms

29 days ago

Added structured evaluation results from reasoning capabilities benchmarks:

AIME 2025 (Pass@1): 96.0%
HMMT 2025 (Pass@1): 99.2%
HLE - Humanity's Last Exam (Pass@1): 30.6%
Codeforces (Rating): 2701

These benchmarks evaluate the model's performance on:

Mathematical reasoning (AIME 2025, HMMT 2025)
General reasoning capabilities (HLE)
Competitive programming (Codeforces)

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Note: This PR adds benchmark metadata independently of PR #11 (which only updates model naming). Both changes affect different parts of the file and should be compatible.

Add model-index with reasoning benchmark evaluations250a5096

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment