APT reward #4

FaisalAhmed0 · 2022-05-20T07:23:14Z

Why we pass (next_obs, next_obs)? It should (obs, next_obs) right? Because you are optimizing for the entropy of $\tau=(s, s^{'})$

cic/agent/cic.py

Line 224 in b523c38

intr_reward = self.compute_apt_reward(next_obs,next_obs)

Kaixhin · 2022-05-20T09:08:34Z

I had exactly the same question. Checking the original APT paper, the text seems to indicate they only take s' prime as the particle, whereas the equation indicates that s is taken as the particle. But agreed that the CIC paper indicates that tau (the projection of (s, s')) should be taken.

FaisalAhmed0 · 2022-05-26T08:18:49Z

I tried both setups, and the performance with (obs, next_obs) is slightly better than (next_obs, next_obs).

Kaixhin · 2022-05-26T13:01:55Z

@FaisalAhmed0 thanks for the empirical study. Have you tried passing in (z, z), as written in the CIC paper, by passing in source and target into pred_net to compute compute_apt_reward(z, z, args)?

cic/agent/cic.py

Lines 199 to 201 in b523c38

    
           source = self.cic.state_net(obs) 
        
           target = self.cic.state_net(next_obs) 
        
           reward = compute_apt_reward(source, target, args) # (b,)

Howuhh · 2022-06-09T12:02:30Z

Also, related to the reward question, what the purpose of:

cic/agent/cic.py

Line 186 in b523c38

def compute_intr_reward(self, obs, skill, next_obs, step):

What it should compute and for what reason?

seolhokim · 2022-07-28T08:24:44Z

I don't agree with that. They just wanted to estimate the novelty of the next states with knn. The compute_apt_message function does not compute what you say(tau novelty). If you want to do so, the simplest way is to concatenate the two.

Sou0602 · 2023-08-03T01:06:02Z

A follow-up question to line 199 and line 200, shouldn't the source be using the state_net and the target be using the next_state_net?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APT reward #4

APT reward #4

FaisalAhmed0 commented May 20, 2022

Kaixhin commented May 20, 2022

FaisalAhmed0 commented May 26, 2022 •

edited

Loading

Kaixhin commented May 26, 2022

Howuhh commented Jun 9, 2022

seolhokim commented Jul 28, 2022 •

edited

Loading

Sou0602 commented Aug 3, 2023

APT reward #4

APT reward #4

Comments

FaisalAhmed0 commented May 20, 2022

Kaixhin commented May 20, 2022

FaisalAhmed0 commented May 26, 2022 • edited Loading

Kaixhin commented May 26, 2022

Howuhh commented Jun 9, 2022

seolhokim commented Jul 28, 2022 • edited Loading

Sou0602 commented Aug 3, 2023

FaisalAhmed0 commented May 26, 2022 •

edited

Loading

seolhokim commented Jul 28, 2022 •

edited

Loading