GitHub - zhipeixu/FakeShield: 🔥 [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang

School of Electronic and Computer Engineering, Peking University

💡 We also have other Copyright Protection projects that may interest you ✨.

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection [CVPR 2024]
Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang

V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection [ACM MM 2024]
Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

GS-Hider: Hiding Messages into 3D Gaussian Splatting [NeurlPS 2024]
Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang

📰 News

[2025.02.14] 🤗 We are progressively open-sourcing all code & pre-trained model weights. Welcome to watch 👀 this repository for the latest updates.
[2025.01.23] 🎉🎉🎉 Our FakeShield has been accepted at ICLR 2025!
[2024.10.03] 🔥 We have released FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models. We present explainable IFDL tasks, constructing the MMTD-Set dataset and the FakeShield framework. Check out the paper. The code and dataset are coming soon

FakeShield Overview

FakeShield is a novel multi-modal framework designed for explainable image forgery detection and localization (IFDL). Unlike traditional black-box IFDL methods, FakeShield integrates multi-modal large language models (MLLMs) to analyze manipulated images, generate tampered region masks, and provide human-understandable explanations based on pixel-level artifacts and semantic inconsistencies. To improve generalization across diverse forgery types, FakeShield introduces domain tags, which guide the model to recognize different manipulation techniques effectively. Additionally, we construct MMTD-Set, a richly annotated dataset containing multi-modal descriptions of manipulated images, fostering better interpretability. Through extensive experiments, FakeShield demonstrates superior performance in detecting and localizing various forgeries, including copy-move, splicing, removal, DeepFake, and AI-generated manipulations.

🏆 Contributions

FakeShield Introduction. We introduce FakeShield, a multi-modal framework for explainable image forgery detection and localization, which is the first to leverage MLLMs for the IFDL task. We also propose Domain Tag-guided Explainable Forgery Detection Module(DTE-FDM) and Multimodal Forgery Localization Module (MFLM) to improve the generalization and robustness of the models
Novel Explainable-IFDL Task. We propose the first explainable image forgery detection and localization (e-IFDL) task, addressing the opacity of traditional IFDL methods by providing both pixel-level and semantic-level explanations.
MMTD-Set Dataset Construction. We create the MMTD-Set by enriching existing IFDL datasets using GPT-4o, generating high-quality “image-mask-description” triplets for enhanced multimodal learning.

🛠️ Requirements and Installation

Note: If you want to reproduce the results from our paper, please prioritize using the Docker image to set up the environment. For more details, see this issue.

Installation via Pip

Ensure your environment meets the following requirements:
- Python == 3.9
- Pytorch == 1.13.0
- CUDA Version == 11.6

Clone the repository:

git clone https://github.com/zhipeixu/FakeShield.git
cd FakeShield

Install dependencies:

apt update && apt install git
pip install -r requirements.txt

## Install MMCV
git clone https://github.com/open-mmlab/mmcv
cd mmcv
git checkout v1.4.7
MMCV_WITH_OPS=1 pip install -e .

Install DTE-FDM:

cd ../DTE-FDM
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Installation via Docker

Pull the pre-built Docker image:

docker pull zhipeixu/dte-fdm
docker pull zhipeixu/mflm

Clone the repository:

git clone https://github.com/zhipeixu/FakeShield.git
cd FakeShield

Run the container:

docker run --gpus all -it --rm \
    -v $(pwd):/workspace/FakeShield \
    zhipeixu/dte-fdm:latest /bin/bash

docker run --gpus all -it --rm \
    -v $(pwd):/workspace/FakeShield \
    zhipeixu/mflm:latest /bin/bash

Inside the container, navigate to the repository:
```
cd /workspace/FakeShield
```

Install MMCV:

git clone https://github.com/open-mmlab/mmcv

🤖 Prepare Model

Download FakeShield weights from Hugging Face

The model weights consist of three parts: DTE-FDM, MFLM, and DTG. For convenience, we have packaged them together and uploaded them to the Hugging Face repository.

We recommend using huggingface_hub to download the weights:
```
pip install huggingface_hub
huggingface-cli download --resume-download zhipeixu/fakeshield-v1-22b --local-dir weight/
```
Download pretrained SAM weight

In MFLM, we will use the SAM pre-training weights. You can use wget to download the sam_vit_h_4b8939.pth model:
```
wget https://huggingface.co/ybelkada/segment-anything/resolve/main/checkpoints/sam_vit_h_4b8939.pth -P weight/
```

Ensure the weights are placed correctly

Organize your weight/ folder as follows:

 FakeShield/
 ├── weight/
 │   ├── fakeshield-v1-22b/
 │   │   ├── DTE-FDM/
 │   │   ├── MFLM/
 │   │   ├── DTG.pth
 │   ├── sam_vit_h_4b8939.pth

🚀 Quick Start

CLI Demo

You can quickly run the demo script by executing:

bash scripts/cli_demo.sh

The cli_demo.sh script allows customization through the following environment variables:

WEIGHT_PATH: Path to the FakeShield weight directory (default: ./weight/fakeshield-v1-22b)
IMAGE_PATH: Path to the input image (default: ./playground/image/Sp_D_CRN_A_ani0043_ani0041_0373.jpg)
DTE_FDM_OUTPUT: Path for saving the DTE-FDM output (default: ./playground/DTE-FDM_output.jsonl)
MFLM_OUTPUT: Path for saving the MFLM output (default: ./playground/DTE-FDM_output.jsonl)

Modify these variables to suit different use cases.

🏋️‍♂️ Train

Training Data Preparation

The training dataset consists of three types of data:

PhotoShop Manipulation Dataset: CASIAv2, Fantastic Reality
DeepFake Manipulation Dataset: FFHQ, FaceAPP
AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
MMTD-Set Dataset: MMTD-Set (Coming soon)

Validation Data Preparation

The validation dataset consists of three types of data:

PhotoShop Manipulation Dataset: CASIA1+, IMD2020, Columbia, coverage, NIST16, DSO, Korus
DeepFake Manipulation Dataset: FFHQ, FaceAPP
AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
MMTD-Set Dataset: MMTD-Set (Coming soon)

Download them from the above links and organize them as follows:

dataset/
├── photoshop/                # PhotoShop Manipulation Dataset
│   ├── CASIAv2_Tp/           # CASIAv2 Tampered Images
│   │   ├── image/
│   │   └── mask/
│   ├── CASIAv2_Au/           # CASIAv2 Authentic Images
│   │   └── image/
│   ├── FR_Tp/                # Fantastic Reality Tampered Images
│   │   ├── image/
│   │   └── mask/
│   ├── FR_Au/                # Fantastic Reality Authentic Images
│   │   └── image/
│   ├── CASIAv1+_Tp/          # CASIAv1+ Tampered Images
│   │   ├── image/
│   │   └── mask/
│   ├── CASIAv1+_Au/          # CASIAv1+ Authentic Images
│   │   └── image/
│   ├── IMD2020_Tp/           # IMD2020 Tampered Images
│   │   ├── image/
│   │   └── mask/
│   ├── IMD2020_Au/           # IMD2020 Authentic Images
│   │   └── image/
│   ├── Columbia/             # Columbia Dataset
│   │   ├── image/
│   │   └── mask/
│   ├── coverage/             # Coverage Dataset
│   │   ├── image/
│   │   └── mask/
│   ├── NIST16/               # NIST16 Dataset
│   │   ├── image/
│   │   └── mask/
│   ├── DSO/                  # DSO Dataset
│   │   ├── image/
│   │   └── mask/
│   └── Korus/                # Korus Dataset
│       ├── image/
│       └── mask/
│
├── deepfake/                 # DeepFake Manipulation Dataset
│   ├── FaceAPP_Train/        # FaceAPP Training Data
│   │   ├── image/
│   │   └── mask/
│   ├── FaceAPP_Val/          # FaceAPP Validation Data
│   │   ├── image/
│   │   └── mask/
│   ├── FFHQ_Train/           # FFHQ Training Data
│   │   └── image/
│   └── FFHQ_Val/             # FFHQ Validation Data
│       └── image/
│
├── aigc/                     # AIGC Editing Manipulation Dataset
│   ├── SD_inpaint_Train/     # Stable Diffusion Inpainting Training Data
│   │   ├── image/
│   │   └── mask/
│   ├── SD_inpaint_Val/       # Stable Diffusion Inpainting Validation Data
│   │   ├── image/
│   │   └── mask/
│   ├── COCO2017_Train/       # COCO2017 Training Data
│   │   └── image/
│   └── COCO2017_Val/         # COCO2017 Validation Data
│       └── image/
│
└── MMTD_Set/                 # Multi-Modal Tamper Description Dataset
    └── MMTD-Set-34k.json     # JSON Training File

LoRA Finetune DTE-FDM

You can fine-tune DTE-FDM using LoRA with the following script:

bash ./scripts/DTE-FDM/finetune_lora.sh

The script allows customization through the following environment variables:

OUTPUT_DIR: Directory for saving training output
DATA_PATH: Path to the training dataset (JSON format)
WEIGHT_PATH: Path to the pre-trained weights

Modify these variables as needed to adapt the training process to different datasets and setups.

LoRA Finetune MFLM

You can fine-tune MFLM using LoRA with the following script:

bash ./scripts/MFLM/finetune_lora.sh

The script allows customization through the following environment variables:

OUTPUT_DIR: Directory for saving training output
DATA_PATH: Path to the training dataset
WEIGHT_PATH: Path to the pre-trained weights
TRAIN_DATA_CHOICE: Selecting the training dataset
VAL_DATA_CHOICE: Selecting the validation dataset

Modify these variables as needed to adapt the training process to different datasets and setups.

🎯 Test

You can test FakeShield using the following script:

bash ./scripts/test.sh

The script allows customization through the following environment variables:

WEIGHT_PATH: Path to the directory containing the FakeShield model weights.
QUESTION_PATH: Path to the test dataset in JSONL format. This file can be generated using ./playground/eval_jsonl.py.
DTE_FDM_OUTPUT: Path for saving the output of the DTE-FDM model.
MFLM_OUTPUT: Path for saving the output of the MFLM model.

Modify these variables as needed to adapt the evaluation process to different datasets and setups.

📚 Main Results

Comparison of detection performance with advanced IFDL methods

📜 Citation

    @inproceedings{xu2024fakeshield,
            title={FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models},
            author={Xu, Zhipei and Zhang, Xuanyu and Li, Runyi and Tang, Zecheng and Huang, Qing and Zhang, Jian},
            booktitle={International Conference on Learning Representations},
            year={2025}
    }

🙏 Acknowledgement

We are thankful to LLaVA, groundingLMM, and LISA for releasing their models and code as open-source contributions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

📰 News

FakeShield Overview

🏆 Contributions

🛠️ Requirements and Installation

Installation via Pip

Installation via Docker

🤖 Prepare Model

🚀 Quick Start

CLI Demo

🏋️‍♂️ Train

Training Data Preparation

Validation Data Preparation

LoRA Finetune DTE-FDM

LoRA Finetune MFLM

🎯 Test

📚 Main Results

Comparison of detection performance with advanced IFDL methods

📜 Citation

🙏 Acknowledgement

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
DTE-FDM		DTE-FDM
MFLM		MFLM
assets		assets
playground		playground
scripts		scripts
weight		weight
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

zhipeixu/FakeShield

Folders and files

Latest commit

History

Repository files navigation

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

📰 News

FakeShield Overview

🏆 Contributions

🛠️ Requirements and Installation

Installation via Pip

Installation via Docker

🤖 Prepare Model

🚀 Quick Start

CLI Demo

🏋️‍♂️ Train

Training Data Preparation

Validation Data Preparation

LoRA Finetune DTE-FDM

LoRA Finetune MFLM

🎯 Test

📚 Main Results

Comparison of detection performance with advanced IFDL methods

📜 Citation

🙏 Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages