GitHub - trunghng/reinforcement_learning_an_introduction: Python Implementation for problems in Reinforcement Learning

Reinforcement Learning: An Introduction

Python Implementation for problems in the Reinforcement Learning: An Introduction written by Richard S. Sutton & Andrew G. Barto.

Many ideas in the solutions are inspired by or taken from this well-known repo of Shangtong Zhang

DETAILS

Chapter 3: Finite MDPs

Gridworld - Bellman equation: source code, result (shown on the command prompt after running the code).

Chapter 4: Dynamic Programming

Gridworld - Iterative Policy Evaluation: source code, result (shown on the command prompt after running the code).
Jack's Car Rental - Policy Iteration: source code, result.
Gambler's Problem - Value Iteration: source code, result.

Chapter 5: Monte Carlo Methods

Blackjack - First-visit Monte Carlo: source code, result.
Blackjack - Monte Carlo Exploring Starts: source code, result.
Blackjack - Off-policy Monte Carlo Prediction: source code, result.
Infinite Variance - OIS: source code, result.
Racetrack - Off-policy Monte Carlo Control: source code, result.

Chapter 6: TD Learning

Random Walk - Constant-alpha Monte Carlo: source code, result.
Random Walk - TD(0) w/ & w/o Batch Updating: source code, result.
Windy Gridworld - Sarsa: source code, result.
Cliff Walking - Q-Learning: source code, result 1, result 2.
Cliff Walking - Sarsa: source code, result 1, result 2.
Cliff Walking - Expected Sarsa: source code, result.
Moutain Car - Q-Learning: source code, result.

Chapter 7: n-step Bootstrapping

Random Walk - n-step TD: source code, result.

Chapter 8: Planning and Learning with Tabular Methods

Dyna Maze - Dyna-Q: source code, result.
Blocking Maze - Dyna-Q: source code, result.
Blocking Maze - Dyna-Q+: source code, result.
Shortcut Maze - Dyna-Q: source code, result.
Shortcut Maze - Dyna-Q+: source code, result.
Mazes - Dyna-Q: source code, result.
Mazes - Prioritized Sweeping: source code, result.
Task - Trajectory Sampling: source code, result.

Chapter 9: On-policy Prediction with Approximation

Random Walk - Gradient Monte Carlo w/ State Aggregation: source code, result.
Random Walk - Semi-gradient TD: source code, result.
Random Walk - Gradient Monte Carlo w/ Fourier basis: source code, result.
Random Walk - Gradient Monte Carlo w/ Polynomial basis: source code, result.
Random Walk - Gradient Monte Carlo w/ Tile Coding: source code, result.
Square Wave - Linear function approximation w/ Coarse Coding: source code, result.

Chapter 10: On-policy Control with Approximation

Mountain Car - Episodic Semi-gradient Sarsa: source code, result.
Mountain Car - Episodic Semi-gradient n-step Sarsa: source code, result.
Access Control - Differential Semi-gradient Sarsa: source code, result.

Chapter 11: Off-policy Methods with Approximation

Chapter 12: Eligible Traces

Random Walk - Offline λ-return: source code, result.
Random Walk - TD(λ): source code, result.
Random Walk - True online TD(λ): source code, result.
Mountain Car - Sarsa(λ) w/ replacing traces: source code, result.
Mountain Car - Sarsa(λ) w/ accumulating traces: source code, result.
Mountain Car - True online Sarsa(λ): source code, result.

Chapter 13: Policy Gradient

Short Corridor - REINFORCE: source code, result.
Short Corridor - REINFORCE w/ baseline: source code, result.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
chapter-03		chapter-03
chapter-04		chapter-04
chapter-05		chapter-05
chapter-06		chapter-06
chapter-07		chapter-07
chapter-08		chapter-08
chapter-09		chapter-09
chapter-10		chapter-10
chapter-11		chapter-11
chapter-12		chapter-12
chapter-13		chapter-13
utils		utils
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning: An Introduction

DETAILS

Chapter 3: Finite MDPs

Chapter 4: Dynamic Programming

Chapter 5: Monte Carlo Methods

Chapter 6: TD Learning

Chapter 7: n-step Bootstrapping

Chapter 8: Planning and Learning with Tabular Methods

Chapter 9: On-policy Prediction with Approximation

Chapter 10: On-policy Control with Approximation

Chapter 11: Off-policy Methods with Approximation

Chapter 12: Eligible Traces

Chapter 13: Policy Gradient

About

Releases

Packages

Languages

trunghng/reinforcement_learning_an_introduction

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning: An Introduction

DETAILS

Chapter 3: Finite MDPs

Chapter 4: Dynamic Programming

Chapter 5: Monte Carlo Methods

Chapter 6: TD Learning

Chapter 7: n-step Bootstrapping

Chapter 8: Planning and Learning with Tabular Methods

Chapter 9: On-policy Prediction with Approximation

Chapter 10: On-policy Control with Approximation

Chapter 11: Off-policy Methods with Approximation

Chapter 12: Eligible Traces

Chapter 13: Policy Gradient

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages