Add model-index with reasoning benchmark evaluations
#13
by
davidlms
- opened
Added structured evaluation results from reasoning capabilities benchmarks:
- AIME 2025 (Pass@1): 96.0%
- HMMT 2025 (Pass@1): 99.2%
- HLE - Humanity's Last Exam (Pass@1): 30.6%
- Codeforces (Rating): 2701
These benchmarks evaluate the model's performance on:
- Mathematical reasoning (AIME 2025, HMMT 2025)
- General reasoning capabilities (HLE)
- Competitive programming (Codeforces)
This enables the model to appear in leaderboards and makes it easier to compare with other models.
Note: This PR adds benchmark metadata independently of PR #11 (which only updates model naming). Both changes affect different parts of the file and should be compatible.