Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APT reward #4

Open
FaisalAhmed0 opened this issue May 20, 2022 · 6 comments
Open

APT reward #4

FaisalAhmed0 opened this issue May 20, 2022 · 6 comments

Comments

@FaisalAhmed0
Copy link

Why we pass (next_obs, next_obs)? It should (obs, next_obs) right? Because you are optimizing for the entropy of $\tau=(s, s^{'})$

intr_reward = self.compute_apt_reward(next_obs,next_obs)

@Kaixhin
Copy link

Kaixhin commented May 20, 2022

I had exactly the same question. Checking the original APT paper, the text seems to indicate they only take s' prime as the particle, whereas the equation indicates that s is taken as the particle. But agreed that the CIC paper indicates that tau (the projection of (s, s')) should be taken.

Screenshot from 2022-05-20 18-06-01

@FaisalAhmed0
Copy link
Author

FaisalAhmed0 commented May 26, 2022

I tried both setups, and the performance with (obs, next_obs) is slightly better than (next_obs, next_obs).

@Kaixhin
Copy link

Kaixhin commented May 26, 2022

@FaisalAhmed0 thanks for the empirical study. Have you tried passing in (z, z), as written in the CIC paper, by passing in source and target into pred_net to compute compute_apt_reward(z, z, args)?

cic/agent/cic.py

Lines 199 to 201 in b523c38

source = self.cic.state_net(obs)
target = self.cic.state_net(next_obs)
reward = compute_apt_reward(source, target, args) # (b,)

@Howuhh
Copy link

Howuhh commented Jun 9, 2022

Also, related to the reward question, what the purpose of:

def compute_intr_reward(self, obs, skill, next_obs, step):

What it should compute and for what reason?

@seolhokim
Copy link

seolhokim commented Jul 28, 2022

I don't agree with that. They just wanted to estimate the novelty of the next states with knn. The compute_apt_message function does not compute what you say(tau novelty). If you want to do so, the simplest way is to concatenate the two.

@Sou0602
Copy link

Sou0602 commented Aug 3, 2023

A follow-up question to line 199 and line 200, shouldn't the source be using the state_net and the target be using the next_state_net?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants