You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The end of the notebook suggests evaluating the policy in a "theoretically better" way by sampling an initial action for each initial state uniformly and then playing with the current policy until the end. A user on Coursera forum reports that pseudocode would make the idea clearer.
The text was updated successfully, but these errors were encountered:
The end of the notebook suggests evaluating the policy in a "theoretically better" way by sampling an initial action for each initial state uniformly and then playing with the current policy until the end. A user on Coursera forum reports that pseudocode would make the idea clearer.
The text was updated successfully, but these errors were encountered: