Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The state and skill encoder learned with contrastive learning is never used? #6

Open
xf-zhao opened this issue Jun 25, 2022 · 4 comments

Comments

@xf-zhao
Copy link

xf-zhao commented Jun 25, 2022

Hi, thank you very much for sharing the codes of the paper. Integrating contrastive learning into skill discovery is very attractive.

However, I found that in this implementation, the state encoder and skill encoder in cic module ($g_{\psi_1}$ and $g_{\psi_2}$ in the paper) are never used before being fed into policy neural networks. In cic/agent/cic.py line 222, parameters in cic is updated once
but not called for encoding obs and skill thereafter.

Another question is how can the agent guarantee that the policy is "indeed conditioned on z" since the intrinsic reward has noting to do with z? In another word, $\tau$ can be arbitarily diverse, which is good for exploration, but there lacks a mechnism to ensure the agent know "what's the influnce of z".

I really like your work. But these issues confuse me a lot. Please correct me if I am wrong or miss something. Thank you again for your kindness of sharing.

@pickxiguapi
Copy link

Hi, I have the same confusion too, may I ask whether your question has been solved now? I think the contrastive learning updated parameters are not being used.

@xf-zhao
Copy link
Author

xf-zhao commented Jul 13, 2022

Hi, I have the same confusion too, may I ask whether your question has been solved now? I think the contrastive learning updated parameters are not being used.

@pickxiguapi Hi, sorry, not solved. I think this is a mistake the author has not noticed since the work is still somehow in the progress / unfinished totally.

@seolhokim
Copy link

Why should g1 and g2 be used after updating once? I think there is no reason to call it from anywhere else before finetuning.

@kc-ustc
Copy link

kc-ustc commented Jun 20, 2024

I would like to ask a simple question. During pre-training, it was found that the neg in compute.cpc_loss is approximately 1200, while the pos is around 6. Is this a normal phenomenon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants