We store evaluation results of reasoning-gym datasets (including llm outputs) in this repository.
Progress and LLM accuracy metrics are tracked on our main Google Spreadsheet.
- Joe Sharratt (joesharratt1229)
- Abdulhakeem Adefioye (Adefioye)
- Zafir Stojanovski (zafstojano)
- Rich Jones (Miserlou)
- Andreas Koepf (andreaskoepf)
- You can reach the eval-team in the
#reasoning-gym
channel of the GPU-Mode discord server. - We would be very happy about donations in the form of OpenRouter API keys (or other inference API providers)!