EndoFM-LV

This repository provides the official PyTorch implementation of the paper Improving Foundation Model for Endoscopy Video Analysis via Representation Learning on Long Sequences by Zhao Wang, Chang Liu, Lingting Zhu, Tongtong Wang, Shaoting Zhang†, and Qi Dou†.

Key Features

First foundation model for learning from long endoscopy videos via self-supervised pre-train.
A large-scale long endoscopy video dataset consisting of 6469 long sequences with an average duration of 68.1 seconds.
Promising performance on 4 different types of typical downstream endoscopic tasks, including classification, segmentation, detection, and workflow recognition.

Details

Recent advancements in endoscopy video analysis have relied on the utilization of relatively short video clips extracted from longer videos or millions of individual frames. However, these approaches tend to neglect the domain-specific characteristics of endoscopy data, which is typically presented as a long stream containing valuable semantic spatial and temporal information. To address this limitation, we propose EndoFM-LV, a foundation model developed under a minute-level pre-training framework upon long endoscopy video sequences. To be specific, we propose a novel masked token modeling scheme within a teacher-student framework for self-supervised video pre-training, which is tailored for learning representations from long video sequences. For pre-training, we construct a large-scale long endoscopy video dataset comprising 6,469 long endoscopic video samples, each longer than 1 minute and totaling over 13 million frames. Our EndoFM-LV is evaluated on four types of endoscopy tasks, namely classification, segmentation, detection, and workflow recognition, serving as the backbone or temporal module. Extensive experimental results demonstrate that our framework outperforms previous state-of-the-art video-based and frame-based approaches by a significant margin on those various downstream tasks.

Datasets

We utilize 4 public and 1 private datasets for pre-training and 4 datasets as the downstream tasks. Except for Cholec80, we provide our preprocessed data for pre-training and downstream tasks, you can directly download via the following links:

Pre-training
Downstream: PolypDiag, CVC-12k, KUMC

Note the preprocessing of Cholec80 for workflow recognition can refer to SV-RCNet.

Get Started

Main Requirements

torch==1.13.1
torchvision==0.14.1
pillow==10.0.1
timm==0.9.7

Installation

We suggest using Anaconda to setup environment on Linux, if you have installed anaconda, you can skip this step.

wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh && zsh Anaconda3-2020.11-Linux-x86_64.sh

Then, we can install packages using provided environment.yaml.

cd EndoFM-LV
conda env create -f environment.yaml
conda activate endofm-lv

Pre-trained Weights

You can directly download our pre-trained Endo-FM via this link and put it under checkpoints/ for downstream fine-tuning.

Also, we provide the pre-trained weights of 4 downstream tasks for direct downstream testing.

Dataset	PolypDiag	CVC-12k	KUMC	Cholec80
Weights	link	link	link	link

Pre-training

cd EndoFM-LV
bash scripts/train_endofm_lv.sh

Downstream Fine-tuning

# PolypDiag (Classification)
cd EndoFM-LV
bash scripts/eval_finetune_polypdiag.sh

# CVC (Segmentation)
cd EndoFM-LV/TransUNet
python train.py

# KUMC (Detection)
cd EndoFM-LV/STMT
python setup.py build develop
python -m torch.distributed.launch \
    --nproc_per_node=1 \
    tools/train_net.py \
    --master_port=$((RANDOM + 10000)) \
    --config-file configs/STFT/kumc_R_50_STFT.yaml \
    OUTPUT_DIR log_dir/kumc_finetune
    
# Cholec80 (Workflow Recognition)
cd EndoFM-LV/SV-RCNet
python train_singlenet_phase_1fc.py --exp endofm_lv

Direct Downstream Testing

# PolypDiag (Classification)
cd Endo-FM
bash scripts/test_finetune_polypdiag.sh

# CVC (Segmentation)
cd Endo-FM/TransUNet
python train.py --test

# KUMC (Detection)
cd Endo-FM/STMT
python setup.py build develop
python -m torch.distributed.launch \
    --nproc_per_node=1 \
    tools/test_net.py \
    --master_port=$((RANDOM + 10000)) \
    --config-file configs/STFT/kumc_R_50_STFT.yaml \
    MODEL.WEIGHT kumc.pth \
    OUTPUT_DIR log_dir/kumc_finetune

# Cholec80 (Workflow Recognition)
cd EndoFM-LV/SV-RCNet
python train_singlenet_phase_1fc.py --exp endofm_lv --test

🙋‍♀️ Feedback and Contact

For further questions, pls feel free to contact Zhao Wang.

🛡️ License

This project is under the Apache License 2.0 license. See LICENSE for details.

🙏 Acknowledgement

Our code is based on DINO, TimeSformer, SVT, TransUNet, and STFT. Thanks them for releasing their codes.

📝 Citation

If you find this code useful, please cite in your research papers.

@article{
    wang2025improving,
    title={Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train},
    author={Zhao Wang and Chang Liu and Lingting Zhu and Tongtong Wang and Shaoting Zhang and Qi Dou},
    booktitle={IEEE Journal of Biomedical and Health Informatics},
    pages={},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
STFT		STFT
SV-RCNet		SV-RCNet
TransUNet		TransUNet
assets		assets
checkpoints		checkpoints
data		data
datasets		datasets
distributed		distributed
fsdp		fsdp
losses		losses
models		models
scripts		scripts
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
eval_finetune.py		eval_finetune.py
run.sh		run.sh
split.py		split.py
train_ssl.py		train_ssl.py
vision_transformer.py		vision_transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EndoFM-LV

Key Features

Details

Datasets

Get Started

Main Requirements

Installation

Pre-trained Weights

Pre-training

Downstream Fine-tuning

Direct Downstream Testing

🙋‍♀️ Feedback and Contact

🛡️ License

🙏 Acknowledgement

📝 Citation

About

Releases

Packages

Languages

License

med-air/EndoFM-LV

Folders and files

Latest commit

History

Repository files navigation

EndoFM-LV

Key Features

Details

Datasets

Get Started

Main Requirements

Installation

Pre-trained Weights

Pre-training

Downstream Fine-tuning

Direct Downstream Testing

🙋‍♀️ Feedback and Contact

🛡️ License

🙏 Acknowledgement

📝 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages