Python Implementation for problems in the Reinforcement Learning: An Introduction written by Richard S. Sutton & Andrew G. Barto.
Many ideas in the solutions are inspired by or taken from this well-known repo of Shangtong Zhang
- Gridworld - Bellman equation: source code, result (shown on the command prompt after running the code).
- Gridworld - Iterative Policy Evaluation: source code, result (shown on the command prompt after running the code).
- Jack's Car Rental - Policy Iteration: source code, result.
- Gambler's Problem - Value Iteration: source code, result.
- Blackjack - First-visit Monte Carlo: source code, result.
- Blackjack - Monte Carlo Exploring Starts: source code, result.
- Blackjack - Off-policy Monte Carlo Prediction: source code, result.
- Infinite Variance - OIS: source code, result.
- Racetrack - Off-policy Monte Carlo Control: source code, result.
- Random Walk - Constant-alpha Monte Carlo: source code, result.
- Random Walk - TD(0) w/ & w/o Batch Updating: source code, result.
- Windy Gridworld - Sarsa: source code, result.
- Cliff Walking - Q-Learning: source code, result 1, result 2.
- Cliff Walking - Sarsa: source code, result 1, result 2.
- Cliff Walking - Expected Sarsa: source code, result.
- Moutain Car - Q-Learning: source code, result.
- Random Walk - n-step TD: source code, result.
- Dyna Maze - Dyna-Q: source code, result.
- Blocking Maze - Dyna-Q: source code, result.
- Blocking Maze - Dyna-Q+: source code, result.
- Shortcut Maze - Dyna-Q: source code, result.
- Shortcut Maze - Dyna-Q+: source code, result.
- Mazes - Dyna-Q: source code, result.
- Mazes - Prioritized Sweeping: source code, result.
- Task - Trajectory Sampling: source code, result.
- Random Walk - Gradient Monte Carlo w/ State Aggregation: source code, result.
- Random Walk - Semi-gradient TD: source code, result.
- Random Walk - Gradient Monte Carlo w/ Fourier basis: source code, result.
- Random Walk - Gradient Monte Carlo w/ Polynomial basis: source code, result.
- Random Walk - Gradient Monte Carlo w/ Tile Coding: source code, result.
- Square Wave - Linear function approximation w/ Coarse Coding: source code, result.
- Mountain Car - Episodic Semi-gradient Sarsa: source code, result.
- Mountain Car - Episodic Semi-gradient n-step Sarsa: source code, result.
- Access Control - Differential Semi-gradient Sarsa: source code, result.
- Random Walk - Offline λ-return: source code, result.
- Random Walk - TD(λ): source code, result.
- Random Walk - True online TD(λ): source code, result.
- Mountain Car - Sarsa(λ) w/ replacing traces: source code, result.
- Mountain Car - Sarsa(λ) w/ accumulating traces: source code, result.
- Mountain Car - True online Sarsa(λ): source code, result.
- Short Corridor - REINFORCE: source code, result.
- Short Corridor - REINFORCE w/ baseline: source code, result.