You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For both cases (and the third titanic case referenced in #301 ) when training both reward and regret increases which could point to no actual learning happening and the increase in reward comes purely from randomly guessing instead of learning.
Notice that for the Statlog data, the evaluation col is the last column, while in the wine data, its the first column.
The text was updated successfully, but these errors were encountered:
TMorville
changed the title
Any real learning happening for DCBTrainer?
Evaluating performance of agents in examples
Sep 2, 2020
It is. However, I think my baseline might be wrong.
I think the relevant baseline to compare performance to should be a bayesian regression trained directly on the data, instead of the output from the neural network. Do you agree?
I have been playing around with the DCBTrainer and found some potential inconsistencies.
and code to evaluate
Define the bandit
training
and evaluation
For both cases (and the third titanic case referenced in #301 ) when training both reward and regret increases which could point to no actual learning happening and the increase in reward comes purely from randomly guessing instead of learning.
Notice that for the Statlog data, the evaluation col is the last column, while in the wine data, its the first column.
The text was updated successfully, but these errors were encountered: