Skip to content
This repository has been archived by the owner on Jun 13, 2024. It is now read-only.

Gradient calc in deterministic OAC #38

Open
kbkartik opened this issue May 5, 2023 · 0 comments
Open

Gradient calc in deterministic OAC #38

kbkartik opened this issue May 5, 2023 · 0 comments

Comments

@kbkartik
Copy link

kbkartik commented May 5, 2023

Hi Quan,

I came across your paper and found it to be interesting. One of the doubts I have is with the implementation of the optimistic policies. Why are you computing gradients of the upper bound w.r.t pre-tanh of the policies? As per the paper, isn' it supposed to be the deterministic action (output of the tanh policy)?

Regards,
Kartik

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant