ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods 🔍

📝 Overview

This is the official repository for ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods (EMNLP 2024). The repo contains the original ReCaLL implementation on the WikiMIA benchmark dataset. Check out the project website for more information.

⭐ If you find our implementation or paper helpful, please consider citing our work ⭐ :

@inproceedings{xie-etal-2024-recall,
    title = "{R}e{C}a{LL}: Membership Inference via Relative Conditional Log-Likelihoods",
    author = "Xie, Roy  and
      Wang, Junlin  and
      Huang, Ruomin  and
      Zhang, Minxing  and
      Ge, Rong  and
      Pei, Jian  and
      Gong, Neil Zhenqiang  and
      Dhingra, Bhuwan",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.493",
    pages = "8671--8689",
}

🛠 Installation

pip install -r requirements.txt

🚀 Usage

Run ReCaLL with the following command:

cd src
python run.py --target_model <TARGET_MODEL> --ref_model <REFERENCE_MODEL> --output_dir <OUTPUT_PATH> --dataset <DATASET> --sub_dataset <SUB_DATASET> --num_shots <NUM_SHOTS>

Example:

python run.py --target_model "EleutherAI/pythia-6.9b" --ref_model "EleutherAI/pythia-70m" --output_dir ./output --dataset "wikimia" --sub_dataset "128" --num_shots 7

🔧 Parameters:

Parameter	Description
`--target_model`	Target model to evaluate (e.g., "EleutherAI/pythia-6.9b")
`--ref_model`	Reference model for comparison (e.g., "EleutherAI/pythia-70m")
`--output_dir`	Directory to save output files
`--dataset`	Dataset to use ("wikimia")
`--sub_dataset`	Subset of the dataset (e.g., "128" from wikimia dataset)
`--num_shots`	Number of shots for prefix
`--pass_window`	(Optional) exceed the context window
`--synthetic_prefix`	(Optional) Use synthetic prefixes generated by GPT-4o
`--api_key_path`	(Optional) Path to OpenAI API key file (required for synthetic prefixes)

📊 Example Output

The script will output results in JSON format and generates visualizations for:

ReCaLL score
Loss
Reference
Zlib
Min-k%
Min-k++

Example visualization from 1 - 28 shots:

📬 Contact

For questions or issues, please open an issue on GitHub or contact the authors directly.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
out		out
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods 🔍

📝 Overview

🛠 Installation

🚀 Usage

🔧 Parameters:

📊 Example Output

📬 Contact

About

Releases

Packages

Languages

License

ruoyuxie/recall

Folders and files

Latest commit

History

Repository files navigation

ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods 🔍

📝 Overview

🛠 Installation

🚀 Usage

🔧 Parameters:

📊 Example Output

📬 Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages