This repository contains the code and resources for replicating the experiments and analysis from our paper Measuring agreeableness bias in Multimodal Models.
- Introduction
- Repository Structure
- Setup
- Generating Experiments
- Evaluating Models
- Analyzing Results
- Models Tested
- Benchmarks
- Results
- Contributing
- License
This project investigates the phenomenon of "agreeableness bias" in multimodal language models - the tendency of these models to disproportionately favor visually presented information, even when it contradicts their prior knowledge. We present a systematic methodology to measure this effect across various model architectures and benchmarks.
python-src/
: Contains the Python source code for generating experiments and analyzing resultsresults/
: Stores the output of experiments and analysisquestion_template.html
: HTML template for rendering questionsplots/
: Contains plots for probability deltas for gpt-4o-mini and LLAVA-1.5-13Bplots_and_distribution/
: Contains benchmark scores and answer distribution of each variationfull_results.csv
: Comprehensive results in csv format
To set up the project environment:
-
Ensure you have Python 3.10 installed on your system.
-
Clone this repository:
git clone https://github.com/jasonlim131/looksRdeceiving.git
cd looksRdeceiving
- Create a virtual environment, e.g. "agreeable_venv":
- Activate this environment in your terminal:
source /path/to/env/agreeable_venv/venv/bin/activate
- Install dependencies
pip3.10 install -r requirements.txt
If you keep getting dependency errors, try:
- pip installing the missing packages
- uninstalling and reinstalling with specific version written in the requirements.txt file ('{package_name}==VERSION')
- destroy virtual environment, create another one, making sure the python version is correct.
To generate the experimental prompts:
- For vMMLU:
python python-src/generate_vmmlu.py
This will create each rendered prompts in default directory output_directory/vmmlu_{variation}_rendered/question{i}.png
- For vSocialIQa:
python python-src/generate_vsocialiqa.py
This will create rendered prompts in:
output_directory/vmmlu_centered_{variation}_rendered
output_directory/vmmlu_{variation}_rendered
python3.10 evaluate_{model}_{task_format}.py
python3.10 calculate_bias_{task_format}.py
We evaluated agreeableness bias in the following models:
- GPT-4o-mini (any other gpt models usable with evaluate_gpt4 and evaluate_gpt4_social; just change the value of 'MODEL' variable.
- Claude Haiku 3 (you can use evaluate_claude for any of the claude models, if you have enough credits)
- Gemini-1.5-flash
- LLAVA-1.5-13B (vMMLU only)
These models were chosen for their balance of performance (70-80% on full MMLU) and computational efficiency.
- Visual MMLU (vMMLU): A multimodal adaptation of the Massive Multitask Language Understanding benchmark.
- Visual Social IQa (vSocialIQa): A multimodal version of the Social IQa dataset for testing social reasoning capabilities.
We encourage further research in the following areas:
- Model Expansion: Extend our benchmarks to additional architectures such as Flamingo, OFA, min-gpt4, and BLIP. Apply our analysis methodology to these models to broaden our understanding of agreeableness bias across different architectures.
- Mechanistic Analysis: Conduct in-depth investigations of open-source models exhibiting visual bias. This could begin with embedding projections to understand the overlap in representation.
We welcome contributions to this project! Please see our Contributing Guidelines for more information.
This project is licensed under the MIT License - see the LICENSE.txt file for details.