Fix Numerical Errors; Improve PER

Latest

Latest

kengz released this 27 Apr 02:47

· 9 commits to master since this release

Improvements/Bug Fixes

Misc

fix overflow error in np.exp of SoftmaxPolicy, BoltzmannPolicy by casting to float64 instead of float32
improve overall np.isfinite asserts
remove index after reset in *analysis.csv
remove unused specs
reorganize and expand test specs
guard continuous action value range in continuous policies
fix analytics param variable sourcing

DDPG

PR: #131

add EpsilonGreedyNoisePolicy

PER

PR: #131

add memory.update(errors) throughout all agents
add shape assert for Q values and errors throughout
auto max_mem_len as max_timestep * max_epis/3 if not specified
put the missing abs for init reward

Assets 2