This is the official repository for ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods (EMNLP 2024). The repo contains the original ReCaLL implementation on the WikiMIA benchmark dataset. Check out the project website for more information.
⭐ If you find our implementation or paper helpful, please consider citing our work ⭐ :
@inproceedings{xie-etal-2024-recall,
title = "{R}e{C}a{LL}: Membership Inference via Relative Conditional Log-Likelihoods",
author = "Xie, Roy and
Wang, Junlin and
Huang, Ruomin and
Zhang, Minxing and
Ge, Rong and
Pei, Jian and
Gong, Neil Zhenqiang and
Dhingra, Bhuwan",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.493",
pages = "8671--8689",
}
pip install -r requirements.txt
Run ReCaLL with the following command:
cd src
python run.py --target_model <TARGET_MODEL> --ref_model <REFERENCE_MODEL> --output_dir <OUTPUT_PATH> --dataset <DATASET> --sub_dataset <SUB_DATASET> --num_shots <NUM_SHOTS>
Example:
python run.py --target_model "EleutherAI/pythia-6.9b" --ref_model "EleutherAI/pythia-70m" --output_dir ./output --dataset "wikimia" --sub_dataset "128" --num_shots 7
Parameter | Description |
---|---|
--target_model |
Target model to evaluate (e.g., "EleutherAI/pythia-6.9b") |
--ref_model |
Reference model for comparison (e.g., "EleutherAI/pythia-70m") |
--output_dir |
Directory to save output files |
--dataset |
Dataset to use ("wikimia") |
--sub_dataset |
Subset of the dataset (e.g., "128" from wikimia dataset) |
--num_shots |
Number of shots for prefix |
--pass_window |
(Optional) exceed the context window |
--synthetic_prefix |
(Optional) Use synthetic prefixes generated by GPT-4o |
--api_key_path |
(Optional) Path to OpenAI API key file (required for synthetic prefixes) |
The script will output results in JSON format and generates visualizations for:
- ReCaLL score
- Loss
- Reference
- Zlib
- Min-k%
- Min-k++
Example visualization from 1 - 28 shots:
For questions or issues, please open an issue on GitHub or contact the authors directly.