You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run a policy scan before first probing, so we can discover the model’s actual policy. Then we know what to test - and what the model will do without any adversarial action in the first place. Band these into “not observed”, “occasional”, “frequent”. Configurably, set how many times we'll ask (fixed count; just once; until we get the failure mode, with a cap; until s.d. converges down).
The text was updated successfully, but these errors were encountered:
Run a policy scan before first probing, so we can discover the model’s actual policy. Then we know what to test - and what the model will do without any adversarial action in the first place. Band these into “not observed”, “occasional”, “frequent”. Configurably, set how many times we'll ask (fixed count; just once; until we get the failure mode, with a cap; until s.d. converges down).
The text was updated successfully, but these errors were encountered: