Skip to content

Latest commit

 

History

History
52 lines (39 loc) · 1.61 KB

README.md

File metadata and controls

52 lines (39 loc) · 1.61 KB


Reward Modeling from GPT4-Vision Preferences

View Demo

About

Product Name Screen Shot

This project aims to replicate the behavior of OpenAI and Deepmind's Deep Reinforcement Learning from Human Preferences using preferences elicited from GPT4-V instead of humans. The code and architecture of the project are based on Matthew Rahtz's implementation of the original paper, simplified for our purposes and translated into PyTorch.

The writeup is available here.

Roadmap

  • Test more tasks and environments

Contact

Jonathan Lu - [email protected]

(back to top)