-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about strategy for sampling goals for replay? #24
Comments
yes, I also have the same question. I see other project, but for the strategy of future they have different sampling methods. However, The result of this project is very good. I am also confused with you, whether to replace it with a specific k target or just replace it with a ratio of k. |
I don't change the implement of Future strategy, but there is another project which replace 'desired goal' with a specific k target. See https://github.com/kaixindelele/DRLib.git @captainzhu123 |
I think it is the author of this project who modified the way of selecting sub-targets. You also raised this question earlier. I also think that the sampling experience should not be modified directly according to the ratio, which is similar to a kind of data enhancement. @whynpt |
Okay so after some research I think he mention the reasons behind these code here in this paper (page 32): https://link.springer.com/content/pdf/10.1007/978-3-030-89370-5.pdf?pdf=button. Paper is titled "Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay". The code is also available here where the same HER sampling method was used: https://github.com/TianhongDai/div-hindsight/blob/master/baselines/her/her.py |
thanks a lot! tihs project works well with my own robotic environment. But I am confused about
her.her_sampler.sample_her_transitions
, because it's quite different from the strategy future as I think.In paper, for every transition in buffer, k goals are sampled for every transition in buffer. Then k new transitions are stored in buffer, which seems to be data augmentation .In code,
replay-k
means ratio to replace, not the number of goals. Asher.her_sampler.sample_her_transitions
shows, when updating the network, 256 transitions are choosen and part of their goals are replaced with achieved goal.Does replacing goals proportionally eual the strategy future?
The text was updated successfully, but these errors were encountered: