Skip to content

Crazytieguy/codenames-oversight

Repository files navigation

CodeNames Oversight

CodeNames is a party card game where players need to find creative word associations. It has some properties that make it compelling as a testbed for scalable oversight experiments:

  • It should be easy for language models to learn.
  • The computational complexity for generating a clue is much larger than for finding an issue with a clue, which is higher still than evaluating an issue.
  • It's easy to procedurally generate many games.
  • It's easy to simulate overseers with various kinds of flaws, or artificially limit the oversight budget.

This project aims to expand on the theory of predicting whether a scalable oversight technique will robustly succeed for some problem domain and overseer, and then test out the theory with many small experiments.

For more detail, see the LW post draft

Relevant background

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published