GSM8k

Paper

Training Verifiers to Solve Math Word Problems https://arxiv.org/abs/2110.14168

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high test performance, despite the conceptual simplicity of this problem distribution.

NOTE: See the official implementation of the task: https://github.com/openai/grade-school-math/blob/master/grade_school_math/calculator.py for how to make use of the dataset's calculator annotations in your language model's sample/generation function.

Homepage: https://github.com/openai/grade-school-math

Citation

@misc{cobbe2021training,
      title={Training Verifiers to Solve Math Word Problems},
      author={Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman},
      year={2021},
      eprint={2110.14168},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Groups and Tasks

Groups

math_word_problems
chain_of_thought
self_consistency

Tasks

gsm8k_yaml
gsm8k_cot: GSM8K with Chain-of-Thought
gsm8k_cot_self_consistency: GSM8K with Chain-of-Thought and Self-Consistency

Checklist

Is in Eval-harness v1.0 ?
Has been checked for regression from v1.0?
Has been checked for equivalence with original paper methodology?
"Main" checked variant clearly denoted?

Variant Wishlist

Variant with Calculator (see https://github.com/openai/grade-school-math/blob/master/grade_school_math/calculator.py for example implementation)
Using Verifiers
Majority voting "without CoT"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GSM8k

Paper

Citation

Groups and Tasks

Groups

Tasks

Checklist

Variant Wishlist

Files

README.md

Latest commit

History

README.md

File metadata and controls

GSM8k

Paper

Citation

Groups and Tasks

Groups

Tasks

Checklist

Variant Wishlist