You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you very much for sharing the codes of the paper. Integrating contrastive learning into skill discovery is very attractive.
However, I found that in this implementation, the state encoder and skill encoder in cic module ($g_{\psi_1}$ and $g_{\psi_2}$ in the paper) are never used before being fed into policy neural networks. In cic/agent/cic.py line 222, parameters in cic is updated once
but not called for encoding obs and skill thereafter.
Another question is how can the agent guarantee that the policy is "indeed conditioned on z" since the intrinsic reward has noting to do with z? In another word, $\tau$ can be arbitarily diverse, which is good for exploration, but there lacks a mechnism to ensure the agent know "what's the influnce of z".
I really like your work. But these issues confuse me a lot. Please correct me if I am wrong or miss something. Thank you again for your kindness of sharing.
The text was updated successfully, but these errors were encountered:
Hi, I have the same confusion too, may I ask whether your question has been solved now? I think the contrastive learning updated parameters are not being used.
Hi, I have the same confusion too, may I ask whether your question has been solved now? I think the contrastive learning updated parameters are not being used.
@pickxiguapi Hi, sorry, not solved. I think this is a mistake the author has not noticed since the work is still somehow in the progress / unfinished totally.
I would like to ask a simple question. During pre-training, it was found that the neg in compute.cpc_loss is approximately 1200, while the pos is around 6. Is this a normal phenomenon?
Hi, thank you very much for sharing the codes of the paper. Integrating contrastive learning into skill discovery is very attractive.
However, I found that in this implementation, the state encoder and skill encoder in$g_{\psi_1}$ and $g_{\psi_2}$ in the paper) are never used before being fed into policy neural networks. In
cic
module (cic/agent/cic.py
line 222, parameters incic
is updated oncebut not called for encoding
obs
andskill
thereafter.Another question is how can the agent guarantee that the policy is "indeed conditioned on z" since the intrinsic reward has noting to do with z? In another word,$\tau$ can be arbitarily diverse, which is good for exploration, but there lacks a mechnism to ensure the agent know "what's the influnce of z".
I really like your work. But these issues confuse me a lot. Please correct me if I am wrong or miss something. Thank you again for your kindness of sharing.
The text was updated successfully, but these errors were encountered: