Improvements/Bug Fixes
Misc
PR #131
- fix overflow error in
np.exp
ofSoftmaxPolicy
,BoltzmannPolicy
by casting tofloat64
instead offloat32
- improve overall
np.isfinite
asserts - remove index after reset in
*analysis.csv
- remove unused specs
- reorganize and expand test specs
- guard continuous action value range in continuous policies
- fix analytics param variable sourcing
DDPG
PR: #131
- add
EpsilonGreedyNoisePolicy
PER
PR: #131
- add
memory.update(errors)
throughout all agents - add shape assert for Q values and errors throughout
- auto
max_mem_len
asmax_timestep * max_epis/3
if not specified - put the missing
abs
for init reward