Skip to content

Implementation of greedy, E-greedy and Upper Confidence Bound (UCB) algorithm on the Multi-Armed-Bandit problem.

Notifications You must be signed in to change notification settings

KaleabTessera/Multi-Armed-Bandit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Armed-Bandit

Description

This is an implementation of $\epsilon$-Greedy, Greedy and Upper Confidence Bound algorithms to solve the Multi-Armed Bandit problem. Implementation details of these algorithms can be found in Chapter 2 of Reinforcement Learning: An Introduction - Rich Sutton

How to Install:

# In project root folder
pip install -r requirements.txt

How to Run:

# In project root folder
./run.sh

Tasks

Part 1

A plot of reward over time (averaged over 100 runs each) on the same axes, for $\epsilon$-greedy with 𝜖 = 0.1, greedy with 𝑄1 = 5, and UCB with 𝑐 = 2. Part1

Part 2

A summary comparison plot of rewards over first 1000 steps for the three algorithms with different values of the hyperparameters. Part2

About

Implementation of greedy, E-greedy and Upper Confidence Bound (UCB) algorithm on the Multi-Armed-Bandit problem.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published