Pseudocode for "better" policy evaluation in CEM #399

dniku · 2020-04-29T00:16:06Z

The end of the notebook suggests evaluating the policy in a "theoretically better" way by sampling an initial action for each initial state uniformly and then playing with the current policy until the end. A user on Coursera forum reports that pseudocode would make the idea clearer.

dniku added the text in notebooks label Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pseudocode for "better" policy evaluation in CEM #399

Pseudocode for "better" policy evaluation in CEM #399

dniku commented Apr 29, 2020

Pseudocode for "better" policy evaluation in CEM #399

Pseudocode for "better" policy evaluation in CEM #399

Comments

dniku commented Apr 29, 2020