Skip to content

Latest commit

 

History

History
144 lines (118 loc) · 6.09 KB

README.md

File metadata and controls

144 lines (118 loc) · 6.09 KB

Visualizing_RL

Visualizing how different agents perceive their environment in the game Snake: Algorithms in Reinforcement Learning

Authors

Snake

The snake game environment is a visualization tool to evaluate different RL algorithms on the game Snake. the algorithms need to learn how to guide the snake to the food without hitting the wall or eating itself (self-loop). these algorithms do that by using image processing to create an input vector of values used in determining the best next step for the snake to play. each time the snake eats the food the algorithm gets a reward. the input vector has 3 values. each value defines the next step (forward, left, right). this is defined differently in each algorithm using state/action values.

Project setup

  1. Clone Repository

    git clone https://github.com/YaadR/Visualizing_RL.git
    cd Visualizing_RL
  2. Create Virtual Environment

    python3 -m venv venv
  3. Activate Environment

    # Linux/MacOS
    source venv/bin/activate
    # Windows
    venv/Scripts/activate
  4. Install requirements

    pip install -r requirements.txt
  5. Run Project

    python3 Ver1\main.py
    

Visualization solutions:

  1. Heatmap
  2. Certainty Arrows
  3. Neural network weights visualization (where NN is used)
  4. Certainty Bar (Entropy based)
  5. State Activation Layer

Reinforcement algorithms & concepts

Agent State Value

  • Value-based: state value
  • Model-based
  • off policy
  • online

RL Algorithm: $$V(s_t)' = V(s_t) + \alpha* \left[ R_{t+1} + (1-s_{t->terminal})(\gamma* V(s_{t+1}) - V(s_t) \right)]$$

Agent Action Value

  • Value-based: action value
  • Model-free
  • off policy
  • online

RL Algorithm: $$Q(s_t)' = R_{t+1} + (1-s_{t->terminal})(\gamma* Max(Q(S_{t+1},A_{t+1})))$$

Agent Policy

  • Policy-based
  • Model-free
  • on policy
  • online

RL Algorithm: Critic: $$A_{\text{critic}}(s_{t})' = R_{t} + (1-s_{t->terminal}) ( \gamma* V(s_{t+1}) - V(s_{t}) )$$

Actor: $$\theta_{\text{actor}} = \nabla_{\theta_{\text{actor}}} \log(\pi_{\theta_{\text{actor}}}(a_{t}|s_{t})) A_{\text{critic}}(s_{t})$$

Training - Stability, Mean & STD - 20 Rounds

Algorithms compete in an Arena

Arena

Additional Notes

Please follow the project steps carefully and ensure that all dependencies are correctly installed before running the solution.

Acknowledgments

The basis for this project is inspired by Patrick Loeber for his work in Teach AI To Play Snake - Reinforcement Learning Tutorial With PyTorch And Pygame

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute it according to the terms of the license.