[ACM MM 2024] DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Welcome! This repository contains the code and checkpoints implementation of paper DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training. DNTextSpotter decomposes the queries of the denoising part into noised positional and noised content queries. it uses the four Bezier control points of the Bezier center curve to generate the noised positional queries. For the noised content queries, considering that the output of the text in a fixed positional order is not conducive to aligning position with content, it employs a masked character sliding method to initialize noised content queries, thereby assisting in the alignment of text content and position. Additionally, to improve the model’s perception of the background, it further utilizes an additional loss function for background characters classification in the denoising training part.

Highlights

[2024/7/16] 🎉🎉🎉 DNTextSpotter is accepted by ACM'MM 2024!

Main Results

1.Pre-trained Models for Total-Text & Inverse-Text & IC15

Backbone	Training Data	Weights
Res-50	Synth150K+Total-Text+MLT17+IC13+IC15+TextOCR	Drive
ViTAEv2-S	Synth150K+Total-Text+MLT17+IC13+IC15+TextOCR	Drive

Finetune on Total-Text

Backbone	External Data	Det-P	Det-R	Det-F1	E2E-None	E2E-Full	Weights
Res-50	Synth150K+MLT17+IC13+IC15+TextOCR	91.5	87.0	$\underline{\text{89.2}}$	$\underline{\text{84.5}}$	$\underline{\text{89.8}}$	Drive
ViTAEv2-S	Synth150K+MLT17+IC13+IC15+TextOCR	92.9	88.6	90.7	85.0	90.5	Drive

Finetune on ICDAR 2015 (IC15)

Backbone	External Data	Det-P	Det-R	Det-F1	E2E-S	E2E-W	E2E-G	Weights
Res-50	Synth150K+Total-Text+MLT17+IC13+TextOCR	92.5	87.2	89.8	$\underline{\text{88.7}}$	$\underline{\text{84.3}}$	$\underline{\text{79.9}}$	OneDrive
ViTAEv2-S	Synth150K+Total-Text+MLT17+IC13+TextOCR	92.4	87.9	$\underline{\text{90.1}}$	89.4	85.2	80.6	OneDrive

Inverse-Text (using the same weights as Finetune model on Total-Text)

Backbone	External Data	Det-P	Det-R	Det-F1	E2E-None	E2E-Full	Weights
Res-50	Synth150K+MLT17+IC13+IC15+TextOCR	94.3	77.2	$\underline{\text{84.9}}$	$\underline{\text{75.9}}$	$\underline{\text{81.6}}$	Drive
ViTAEv2-S	Synth150K+MLT17+IC13+IC15+TextOCR	95.4	79.2	86.4	78.1	83.8	Drive

2.Pre-trained Model for CTW1500

Backbone	Training Data	Weights
Res-50	Synth150K+Total-Text+MLT17+IC13+IC15+TextOCR	Drive

Finetune on CTW1500

Backbone	External Data	Det-P	Det-R	Det-F1	E2E-None	E2E-Full	Weights
Res-50	Synth150K+Total-Text+MLT17+IC13+IC15+TextOCR	93.5	87.1	90.2	67.0	84.2	Drive

Usage

Installation
- Python==3.8
- PyTorch>=2.0.1
- CUDA>=11.7
- Detectron2

git clone https://github.com/yyyyyxie/DNTextSpotter.git
cd DNTextSpotter
conda create -n dnts python=3.8 -y
conda activate dnts
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
cd detectron2
pip install -e .
cd ..
pip install -r requirements.txt
python setup.py build develop

Preparation

You can find the datasets here.

|- ./datasets
   |- syntext1
   |  |- train_images
   |  └  annotations
   |       |- train_37voc.json
   |       └  train_96voc.json
   |- syntext2
   |  |- train_images
   |  └  annotations
   |       |- train_37voc.json
   |       └  train_96voc.json
   |- mlt2017
   |  |- train_images
   |  └  annotations
   |       |- train_37voc.json
   |       └  train_96voc.json
   |- totaltext
   |  |- train_images
   |  |- test_images
   |  |- train_37voc.json
   |  |- train_96voc.json
   |  |- weak_voc_new.txt
   |  |- weak_voc_pair_list.txt
   |  └  test.json
   |- ic13
   |  |- train_images
   |  |- train_37voc.json
   |  └  train_96voc.json
   |- ic15
   |  |- train_images
   |  |- test_images
   |  |- train_37voc.json
   |  |- train_96voc.json
   |  └  test.json
   |- CTW1500
   |  |- train_images
   |  |- test_images
   |  └  annotations
   |       |- train_96voc.json
   |       └  test.json
   |- textocr
   |  |- train_images
   |  |- train_37voc_1.json
   |  |- train_37voc_2.json
   |  |- train_96voc_1.json
   |  └  train_96voc_2.json
   |- inversetext
   |  |- test_images
   |  |- inversetext_lexicon.txt
   |  |- inversetext_pair_list.txt
   |- evaluation
   |  |- gt_*.zip

Training

Total-Text & ICDAR2015

1. Pre-train

For example, pre-train DNTextSpotter:

python tools/train_net.py --config-file configs/R_50/pretrain/150k_tt_mlt_13_15.yaml --num-gpus 8

2. Fine-tune

Fine-tune on Total-Text or ICDAR2015:

python tools/train_net.py --config-file configs/R_50/TotalText/finetune_150k_tt_mlt_13_15_textocr.yaml --num-gpus 8
python tools/train_net.py --config-file configs/R_50/IC15/finetune_150k_tt_mlt_13_15_textocr.yaml --num-gpus 8

CTW1500

1. Pre-train

python tools/train_net.py --config-file configs/R_50/CTW1500/pretrain_96voc_50maxlen.yaml --num-gpus 8

2. Fine-tune

python tools/train_net.py --config-file configs/R_50/CTW1500/finetune_96voc_50maxlen.yaml --num-gpus 8

Evaluation

python tools/train_net.py --config-file ${CONFIG_FILE} --eval-only MODEL.WEIGHTS ${MODEL_PATH}

Visualization Demo

python demo/demo.py --config-file ${CONFIG_FILE} --input ${IMAGES_FOLDER_OR_ONE_IMAGE_PATH} --output ${OUTPUT_PATH} --opts MODEL.WEIGHTS <MODEL_PATH>

Citation

🤩 If you encounter any difficulty using our code, please do not hesitate to submit an issue or directly contact us!

😍 If you do find our work helpful (or if you would be so kind as to offer us some encouragement), please consider kindly giving a star, and citing our paper.

@article{xie2024dntextspotter,
  title={DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training},
  author={Xie, Yu and Qiao, Qian and Gao, Jun and Wu, Tianxiang and Huang, Shaoyao and Fan, Jiaqing and Cao, Ziqiang and Wang, Zili and Zhang, Yue and Zhang, Jielei and others},
  journal={arXiv preprint arXiv:2408.00355},
  year={2024}
}

or

@inproceedings{qiao2024dntextspotter,
  title={DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training},
  author={Qiao, Qian and Xie, Yu and Gao, Jun and Wu, Tianxiang and Huang, Shaoyao and Fan, Jiaqing and Cao, Ziqiang and Wang, Zili and Zhang, Yue},
  booktitle={ACM Multimedia 2024}
}

Acknowledgement

This project is based on Adelaidet and DeepSolo. For academic use, this project is licensed under the 2-clause BSD License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ACM MM 2024] DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Highlights

Main Results

Usage

Installation

Preparation

Training

Evaluation

Visualization Demo

Citation

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
adet		adet
configs		configs
demo		demo
detectron2		detectron2
figs		figs
pretrained_backbone		pretrained_backbone
tools		tools
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

yyyyyxie/DNTextSpotter

Folders and files

Latest commit

History

Repository files navigation

[ACM MM 2024] DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Highlights

Main Results

Usage

Installation

Preparation

Training

Evaluation

Visualization Demo

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages