Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QLearner failing to stabilize due to odd spikes in QTable #48

Open
bwliv opened this issue Nov 15, 2020 · 4 comments
Open

QLearner failing to stabilize due to odd spikes in QTable #48

bwliv opened this issue Nov 15, 2020 · 4 comments
Assignees

Comments

@bwliv
Copy link
Collaborator

bwliv commented Nov 15, 2020

These don't seem organic. Clearly, something funky is happening in those episodes with the spikes (you see the makings of stabilization early but then it goes haywire). There's some random phenomena or bug causing crazy spikes in the Q table that shouldn't happen once the table is stabilized, and they are clearly completely undoing the training that had been done so far (as you see the table re-stabilizing after this happens).

This explains why the q table is unstable and nonsensical. These spikes continued continuously even when I tried 1000 episodes - they don't go away, and reset the training process.

@shreyasj3006 @mariemayadi this might require your knowledge of the implementation you wrote. These are definitely nonsensical jumps. I've debugged to death - one major note is I temporarily set commission and the sell penalty to zero to mitigate these jumps - and I have isolated/analyzed the episodes where it's happening, but there's something funky going on that is preventing our q learner from working properly. that might explain why the qtables have always looked wrong.

visualizations.ipynb (pushing new version momentarily) has good visualization of this phenomena.

The jumps appear to happen across all states - here are the 10 biggest row changes for one of these weird episode jumps:
image

here's a visualization of the spikes - the smaller (but still significant) ones continue like this indefinitely (I tried 1000 episodes earlier to confirm this
image

Here is what 1k episodes looks like (ignore the incorrect title of the graph):
image

@mariemayadi
Copy link
Collaborator

Thanks. Keen on further discussing this. Meaning of peaks and creating a new issue for us to investigate from our front etc.

@bwliv
Copy link
Collaborator Author

bwliv commented Nov 16, 2020

Looks better in latest push; separated epsilon from alpha and reduced epsilon by .005% every day via epsilon_decay parameter. Continues decaying more and more through each episode.

Are we satisfied by this or does @shreyasj3006 feel epsilon and gamma shouldn't have been separated? (CC @mariemayadi)

this stabilized results (although at the cost of reduced performance, so epsilon and epsilon_decay might require some tuning); it looks like there are a few days where there are huge stock returns or losses, and when random action is taken on those days (through epsilon) it throws the qtable into disarray because a massive reward (whether positive or negative) is applied in a different place than usual.

i think this is OK. we can note on report.

definitely a bigger theme of the qtable being funky, that's still a struggle. but this is progress.

@bwliv
Copy link
Collaborator Author

bwliv commented Nov 16, 2020

so basically:

question: how long does it take qtable to stabilize?

answer: it never stabilizes, unless we decay epsilon (the percent chance of a random action) over time... if it wasn't for a few days that can throw the data into disarray, it would stabilize in the first 10-20 episodes

additional note: forcing it to stabilize leads to worse performance - it performed much better without epsilon decaying.... so maybe letting the qtable never stabilize (and show these spikes like above) is optimal?

@bwliv
Copy link
Collaborator Author

bwliv commented Nov 16, 2020

here's the state now - decaying epsilon hurt performance b/c it led to holding more cash... maybe we just skip decaying epsilon? @shreyasj3006 @mariemayadi curious to hear your answers

image

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants