Add model-index with reasoning benchmark evaluations

#13
by davidlms - opened

Added structured evaluation results from reasoning capabilities benchmarks:

  • AIME 2025 (Pass@1): 96.0%
  • HMMT 2025 (Pass@1): 99.2%
  • HLE - Humanity's Last Exam (Pass@1): 30.6%
  • Codeforces (Rating): 2701

These benchmarks evaluate the model's performance on:

  • Mathematical reasoning (AIME 2025, HMMT 2025)
  • General reasoning capabilities (HLE)
  • Competitive programming (Codeforces)

This enables the model to appear in leaderboards and makes it easier to compare with other models.

Note: This PR adds benchmark metadata independently of PR #11 (which only updates model naming). Both changes affect different parts of the file and should be compatible.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment