Gradient calc in deterministic OAC #38

kbkartik · 2023-05-05T14:36:02Z

Hi Quan,

I came across your paper and found it to be interesting. One of the doubts I have is with the implementation of the optimistic policies. Why are you computing gradients of the upper bound w.r.t pre-tanh of the policies? As per the paper, isn' it supposed to be the deterministic action (output of the tanh policy)?

Regards,
Kartik

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient calc in deterministic OAC #38

Gradient calc in deterministic OAC #38

kbkartik commented May 5, 2023

Gradient calc in deterministic OAC #38

Gradient calc in deterministic OAC #38

Comments

kbkartik commented May 5, 2023