Reward Modeling from GPT4-Vision Preferences

About

This project aims to replicate the behavior of OpenAI and Deepmind's Deep Reinforcement Learning from Human Preferences using preferences elicited from GPT4-V instead of humans. The code and architecture of the project are based on Matthew Rahtz's implementation of the original paper, simplified for our purposes and translated into PyTorch.

The writeup is available here.

Roadmap

Test more tasks and environments

Contact

Jonathan Lu - jonathan.lu31@gmail.com

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reward Modeling from GPT4-Vision Preferences

About

Roadmap

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reward Modeling from GPT4-Vision Preferences

About

Roadmap

Contact