Skip to content

πŸ”₯ [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

License

Notifications You must be signed in to change notification settings

zhipeixu/FakeShield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Image Alt Text

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang

School of Electronic and Computer Engineering, Peking University

arXiv License Hits hf_space Home Page
wechat wechat zhihu csdn


πŸ’‘ We also have other Copyright Protection projects that may interest you ✨.

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection [CVPR 2024]
Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang
github github arXiv

V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection [ACM MM 2024]
Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang
github github arXiv

GS-Hider: Hiding Messages into 3D Gaussian Splatting [NeurlPS 2024]
Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang
github github arXiv

πŸ“° News

  • [2025.02.14] πŸ€— We are progressively open-sourcing all code & pre-trained model weights. Welcome to watch πŸ‘€ this repository for the latest updates.
  • [2025.01.23] πŸŽ‰πŸŽ‰πŸŽ‰ Our FakeShield has been accepted at ICLR 2025!
  • [2024.10.03] πŸ”₯ We have released FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models. We present explainable IFDL tasks, constructing the MMTD-Set dataset and the FakeShield framework. Check out the paper. The code and dataset are coming soon

FakeShield Overview

FakeShield is a novel multi-modal framework designed for explainable image forgery detection and localization (IFDL). Unlike traditional black-box IFDL methods, FakeShield integrates multi-modal large language models (MLLMs) to analyze manipulated images, generate tampered region masks, and provide human-understandable explanations based on pixel-level artifacts and semantic inconsistencies. To improve generalization across diverse forgery types, FakeShield introduces domain tags, which guide the model to recognize different manipulation techniques effectively. Additionally, we construct MMTD-Set, a richly annotated dataset containing multi-modal descriptions of manipulated images, fostering better interpretability. Through extensive experiments, FakeShield demonstrates superior performance in detecting and localizing various forgeries, including copy-move, splicing, removal, DeepFake, and AI-generated manipulations.

alt text

πŸ† Contributions

  • FakeShield Introduction. We introduce FakeShield, a multi-modal framework for explainable image forgery detection and localization, which is the first to leverage MLLMs for the IFDL task. We also propose Domain Tag-guided Explainable Forgery Detection Module(DTE-FDM) and Multimodal Forgery Localization Module (MFLM) to improve the generalization and robustness of the models

  • Novel Explainable-IFDL Task. We propose the first explainable image forgery detection and localization (e-IFDL) task, addressing the opacity of traditional IFDL methods by providing both pixel-level and semantic-level explanations.

  • MMTD-Set Dataset Construction. We create the MMTD-Set by enriching existing IFDL datasets using GPT-4o, generating high-quality β€œimage-mask-description” triplets for enhanced multimodal learning.

πŸ› οΈ Requirements and Installation

Note: If you want to reproduce the results from our paper, please prioritize using the Docker image to set up the environment. For more details, see this issue.

Installation via Pip

  1. Ensure your environment meets the following requirements:

    • Python == 3.9
    • Pytorch == 1.13.0
    • CUDA Version == 11.6
  2. Clone the repository:

    git clone https://github.com/zhipeixu/FakeShield.git
    cd FakeShield
  3. Install dependencies:

    apt update && apt install git
    pip install -r requirements.txt
    
    ## Install MMCV
    git clone https://github.com/open-mmlab/mmcv
    cd mmcv
    git checkout v1.4.7
    MMCV_WITH_OPS=1 pip install -e .
  4. Install DTE-FDM:

    cd ../DTE-FDM
    pip install -e .
    pip install -e ".[train]"
    pip install flash-attn --no-build-isolation

Installation via Docker

  1. Pull the pre-built Docker image:

    docker pull zhipeixu/dte-fdm
    docker pull zhipeixu/mflm
  2. Clone the repository:

    git clone https://github.com/zhipeixu/FakeShield.git
    cd FakeShield
  3. Run the container:

    docker run --gpus all -it --rm \
        -v $(pwd):/workspace/FakeShield \
        zhipeixu/dte-fdm:latest /bin/bash
    
    docker run --gpus all -it --rm \
        -v $(pwd):/workspace/FakeShield \
        zhipeixu/mflm:latest /bin/bash
  4. Inside the container, navigate to the repository:

    cd /workspace/FakeShield
  5. Install MMCV:

    git clone https://github.com/open-mmlab/mmcv

πŸ€– Prepare Model

  1. Download FakeShield weights from Hugging Face

    The model weights consist of three parts: DTE-FDM, MFLM, and DTG. For convenience, we have packaged them together and uploaded them to the Hugging Face repository.

    We recommend using huggingface_hub to download the weights:

    pip install huggingface_hub
    huggingface-cli download --resume-download zhipeixu/fakeshield-v1-22b --local-dir weight/
  2. Download pretrained SAM weight

    In MFLM, we will use the SAM pre-training weights. You can use wget to download the sam_vit_h_4b8939.pth model:

    wget https://huggingface.co/ybelkada/segment-anything/resolve/main/checkpoints/sam_vit_h_4b8939.pth -P weight/
  3. Ensure the weights are placed correctly

    Organize your weight/ folder as follows:

     FakeShield/
     β”œβ”€β”€ weight/
     β”‚   β”œβ”€β”€ fakeshield-v1-22b/
     β”‚   β”‚   β”œβ”€β”€ DTE-FDM/
     β”‚   β”‚   β”œβ”€β”€ MFLM/
     β”‚   β”‚   β”œβ”€β”€ DTG.pth
     β”‚   β”œβ”€β”€ sam_vit_h_4b8939.pth
    

πŸš€ Quick Start

CLI Demo

You can quickly run the demo script by executing:

bash scripts/cli_demo.sh

The cli_demo.sh script allows customization through the following environment variables:

  • WEIGHT_PATH: Path to the FakeShield weight directory (default: ./weight/fakeshield-v1-22b)
  • IMAGE_PATH: Path to the input image (default: ./playground/image/Sp_D_CRN_A_ani0043_ani0041_0373.jpg)
  • DTE_FDM_OUTPUT: Path for saving the DTE-FDM output (default: ./playground/DTE-FDM_output.jsonl)
  • MFLM_OUTPUT: Path for saving the MFLM output (default: ./playground/DTE-FDM_output.jsonl)

Modify these variables to suit different use cases.

πŸ‹οΈβ€β™‚οΈ Train

Training Data Preparation

The training dataset consists of three types of data:

  1. PhotoShop Manipulation Dataset: CASIAv2, Fantastic Reality
  2. DeepFake Manipulation Dataset: FFHQ, FaceAPP
  3. AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
  4. MMTD-Set Dataset: MMTD-Set (Coming soon)

Validation Data Preparation

The validation dataset consists of three types of data:

  1. PhotoShop Manipulation Dataset: CASIA1+, IMD2020, Columbia, coverage, NIST16, DSO, Korus
  2. DeepFake Manipulation Dataset: FFHQ, FaceAPP
  3. AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
  4. MMTD-Set Dataset: MMTD-Set (Coming soon)

Download them from the above links and organize them as follows:

dataset/
β”œβ”€β”€ photoshop/                # PhotoShop Manipulation Dataset
β”‚   β”œβ”€β”€ CASIAv2_Tp/           # CASIAv2 Tampered Images
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ CASIAv2_Au/           # CASIAv2 Authentic Images
β”‚   β”‚   └── image/
β”‚   β”œβ”€β”€ FR_Tp/                # Fantastic Reality Tampered Images
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ FR_Au/                # Fantastic Reality Authentic Images
β”‚   β”‚   └── image/
β”‚   β”œβ”€β”€ CASIAv1+_Tp/          # CASIAv1+ Tampered Images
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ CASIAv1+_Au/          # CASIAv1+ Authentic Images
β”‚   β”‚   └── image/
β”‚   β”œβ”€β”€ IMD2020_Tp/           # IMD2020 Tampered Images
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ IMD2020_Au/           # IMD2020 Authentic Images
β”‚   β”‚   └── image/
β”‚   β”œβ”€β”€ Columbia/             # Columbia Dataset
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ coverage/             # Coverage Dataset
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ NIST16/               # NIST16 Dataset
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ DSO/                  # DSO Dataset
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   └── Korus/                # Korus Dataset
β”‚       β”œβ”€β”€ image/
β”‚       └── mask/
β”‚
β”œβ”€β”€ deepfake/                 # DeepFake Manipulation Dataset
β”‚   β”œβ”€β”€ FaceAPP_Train/        # FaceAPP Training Data
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ FaceAPP_Val/          # FaceAPP Validation Data
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ FFHQ_Train/           # FFHQ Training Data
β”‚   β”‚   └── image/
β”‚   └── FFHQ_Val/             # FFHQ Validation Data
β”‚       └── image/
β”‚
β”œβ”€β”€ aigc/                     # AIGC Editing Manipulation Dataset
β”‚   β”œβ”€β”€ SD_inpaint_Train/     # Stable Diffusion Inpainting Training Data
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ SD_inpaint_Val/       # Stable Diffusion Inpainting Validation Data
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ COCO2017_Train/       # COCO2017 Training Data
β”‚   β”‚   └── image/
β”‚   └── COCO2017_Val/         # COCO2017 Validation Data
β”‚       └── image/
β”‚
└── MMTD_Set/                 # Multi-Modal Tamper Description Dataset
    └── MMTD-Set-34k.json     # JSON Training File

LoRA Finetune DTE-FDM

You can fine-tune DTE-FDM using LoRA with the following script:

bash ./scripts/DTE-FDM/finetune_lora.sh

The script allows customization through the following environment variables:

  • OUTPUT_DIR: Directory for saving training output
  • DATA_PATH: Path to the training dataset (JSON format)
  • WEIGHT_PATH: Path to the pre-trained weights

Modify these variables as needed to adapt the training process to different datasets and setups.

LoRA Finetune MFLM

You can fine-tune MFLM using LoRA with the following script:

bash ./scripts/MFLM/finetune_lora.sh

The script allows customization through the following environment variables:

  • OUTPUT_DIR: Directory for saving training output
  • DATA_PATH: Path to the training dataset
  • WEIGHT_PATH: Path to the pre-trained weights
  • TRAIN_DATA_CHOICE: Selecting the training dataset
  • VAL_DATA_CHOICE: Selecting the validation dataset

Modify these variables as needed to adapt the training process to different datasets and setups.

🎯 Test

You can test FakeShield using the following script:

bash ./scripts/test.sh

The script allows customization through the following environment variables:

  • WEIGHT_PATH: Path to the directory containing the FakeShield model weights.
  • QUESTION_PATH: Path to the test dataset in JSONL format. This file can be generated using ./playground/eval_jsonl.py.
  • DTE_FDM_OUTPUT: Path for saving the output of the DTE-FDM model.
  • MFLM_OUTPUT: Path for saving the output of the MFLM model.

Modify these variables as needed to adapt the evaluation process to different datasets and setups.

πŸ“š Main Results

Comparison of detection performance with advanced IFDL methods

πŸ“œ Citation

    @inproceedings{xu2024fakeshield,
            title={FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models},
            author={Xu, Zhipei and Zhang, Xuanyu and Li, Runyi and Tang, Zecheng and Huang, Qing and Zhang, Jian},
            booktitle={International Conference on Learning Representations},
            year={2025}
    }

πŸ™ Acknowledgement

We are thankful to LLaVA, groundingLMM, and LISA for releasing their models and code as open-source contributions.

About

πŸ”₯ [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages