-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APT reward #4
Comments
I tried both setups, and the performance with (obs, next_obs) is slightly better than (next_obs, next_obs). |
@FaisalAhmed0 thanks for the empirical study. Have you tried passing in (z, z), as written in the CIC paper, by passing in Lines 199 to 201 in b523c38
|
Also, related to the reward question, what the purpose of: Line 186 in b523c38
What it should compute and for what reason? |
I don't agree with that. They just wanted to estimate the novelty of the next states with knn. The compute_apt_message function does not compute what you say(tau novelty). If you want to do so, the simplest way is to concatenate the two. |
A follow-up question to line 199 and line 200, shouldn't the source be using the state_net and the target be using the next_state_net? |
Why we pass (next_obs, next_obs)? It should (obs, next_obs) right? Because you are optimizing for the entropy of$\tau=(s, s^{'})$
cic/agent/cic.py
Line 224 in b523c38
The text was updated successfully, but these errors were encountered: