This work presents a novel strategy to measure bias in text-to-image models. Using paired prompts that specify a gender and vaguely reference an object (e.g. “a man/woman holding an item”) we can examine whether certain objects are associated with a certain gender. We found that male prompts were more prone to generate objects including ties, backpacks, knives, and trucks. Female prompts were more likely to generate objects including handbags, umbrellas, bottles, and cups.
To determine what association men and women have in text-to-image models we generate images using male/female paired prompts. For example, the following two prompts (1) “A man holding an item” (2) “A woman holding an item” will generate similar, but distinct, images. Object detection is then run on the resulting images. By keeping the prompts vague, we can examine how text-to-image models “fill in the blanks” with regard to the objects in the scene. Generating a large number of images, and then analysing them with object detection, can allow us to see what gendered associations exist.
Before running the notebooks you should run the following pip install commands
pip install diffusers transformers accelerate scipy safetensors
pip install -q dalle-mini==0.1.3
pip install -q git+https://github.com/patil-suraj/vqgan-jax.git
Before running the shell scripts, you'll need to set download YOLOv3 from the following link https://pjreddie.com/darknet/yolo/. Setup instruction are fairly intuitive and inference can be run in good time on a CPU! However, make sure the shell scripts are in the same directory as the darknet files before running them.