Modular Q-Learning on a Toy Litter-Gathering Example

A simple toy environment is designed where an agent's goal is to go from any given position on a 2D grid-world and accomplish several goals:

Four separate modules are trained independently and the resultant actions from each module are merged using by simply weighting them.

Below is two images showing each individual policy, then their weighted policy.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
Modular_Reinforcement_Learning.pdf		Modular_Reinforcement_Learning.pdf
README.md		README.md
modular_rl.ipynb		modular_rl.ipynb

Provide feedback