Skip to content

ratcht/alignment

Repository files navigation

AI Alignment Experiments

This project repository contains my research and implementations exploring various aspects of AI alignment. Below is an overview of the current contents of this repository.

Contents

1. ai-safety-debate

  • Description: This is my attempt at implementing the AI Safety Debate introduced in this paper.
  • So far, I have experimented using GPT-4o and smaller open-source models. Only GPT-4o worked so far.
  • Currently, I am building a web app to allow users to play around with the safety debate.

2. machine-unlearning

  • Description: This is me experimenting with machine unlearning.
  • So far, I have built a basic CNN to classify the images of the CIFAR10 dataset.
  • My goal is to experiment with the base model. I hope to build a tool that can visualize the "lighting up" of different neural network connections given an input.

Installation

  1. Clone this repository:

    git clone https://github.com/ratch/alignment.git
  2. Navigate to the desired project directory:

    cd <experiment>
  3. Follow the setup instructions in the respective project folders.


Feel free to reach out with questions or feedback!

About

Collection of AI alignment experiments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published