v0.5.1
This minor release includes: improvement of the documentation thanks to @felixdittrich92, bugs fixed, support of rotation extended to Tensorflow backend, a switch from PyMuPDF to pypdfmium2 and a nice integration to the Hugginface Hub thanks to @fg-mindee !
Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
Improvement of the documentation
The documentation has been improved adding a new theme, illustrations, and docstring has been completed and developed.
This how it renders:
Rotated text detection extended to Tensorflow backend
We provide weights for the linknet_resnet18_rotation
model which has been deeply modified: We implemented a new loss (based on Dice Loss and Focal Loss), we changed the computation of the targets so that polygons are shrunken the same way they are in the DBNet which improves highly the precision of the segmenter and we trained the model preserving the aspect ratio of the images.
All these improvements led to much better results, and the pretrained model is now very robust.
Preserving the aspect ratio in the detection task
You can now choose to preserve the aspect ratio in the detection_predictor:
>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
This option can also be activated in the high level end-to-end predictor:
>>> from doctr.model import ocr_predictor
>>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
Integration within the HugginFace Hub
The artefact detection model is now available on the HugginFace Hub, this is amazing:
On DocTR, you can now use the .from_hub()
method so that those 2 snippets are equivalent:
# Pretrained
from doctr.models.obj_detection import fasterrcnn_mobilenet_v3_large_fpn
model = fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
and:
# HF Hub
from doctr.models.obj_detection.factory import from_hub
model = from_hub("mindee/fasterrcnn_mobilenet_v3_large_fpn")
Breaking changes
Replacing the PyMuPDF dependency with pypdfmium2 which is license compatible
We replaced for the PyMuPDF dependency with pypdfmium2 for a license-compatibility issue, so we loose the word and objects extraction from source pdf which was done with PyMuPDF. It wasn't used in any models so it is not a big issue, but anyway we will work in the future to re-integrate such a feature.
Full changelog
What's Changed
Breaking Changes 🛠
- fix: polygon orientation + line aggregation by @charlesmindee in #801
- refactor: Switched from PyMuPDF to pypdfium2 by @fg-mindee in #829
New Features
- feat: Added RandomHorizontalFLip in TF by @SiddhantBahuguna in #779
- Imgur5k dataset integration by @felixdittrich92 in #785
- feat: Added support of GPU for predictors in PyTorch by @fg-mindee in #808
- Add SynthWordGenerator to text reco training scripts by @felixdittrich92 in #825
- fix: Fixed some ResNet architecture imprecisions by @fg-mindee in #828
- feat: Added shadow augmentation for all backends by @fg-mindee in #811
- feat: Added loading method for PyTorch artefact detection models from HF Hub by @fg-mindee in #836
- feat: add rotated linknet_resnet18 tensorflow ckpts by @charlesmindee in #817
Bug Fixes
- fix: Fixed rotation of img + target by @fg-mindee in #784
- fix: show sample when batch size is 1 by @charlesmindee in #787
- ci: Fixed PR label check job by @fg-mindee in #792
- ci: Fixed typo in the script ref by @fg-mindee in #794
- [datasets] fix description by @felixdittrich92 in #795
- fix: linknet target computation by @charlesmindee in #803
- ci: Fixed issue templates by @fg-mindee in #806
- fix: Reverted mistake in demo by @fg-mindee in #810
- Restore remap boxes by @Rob192 in #812
- fix: Fixed SAR model for training and inference in PyTorch by @fg-mindee in #831
- fix: Fixed expand_line for horizontal & vertical cases by @fg-mindee in #842
- fix: Fixes inplace target modifications for AbstractDatasets by @fg-mindee in #848
- fix: Fixed landing page and title underlines by @fg-mindee in #860
- docs: Fixed HTML title by @fg-mindee in #864
Improvements
- docs: Updated headers of python files by @fg-mindee in #781
- [datasets] unify np_dtype and fix comments by @felixdittrich92 in #782
- fix: Clip in rotation transform + eval_straight mode for training by @charlesmindee in #786
- refactor: Avoids instantiating orientation predictor when unnecessary by @fg-mindee in #809
- feat: add straight-eval arg in evaluate script by @charlesmindee in #793
- feat: add dice loss in linknet by @charlesmindee in #816
- feat: add shrinked target in linknet + dilation in postprocessing by @charlesmindee in #822
- feat: replace bce by focal loss in linknet loss by @charlesmindee in #824
- docs: add rotation in docs by @charlesmindee in #846
- feat: add aspect ratio for ocr predictor by @charlesmindee in #835
- feat: add target to resize transform for aspect ratio training (detection task) by @charlesmindee in #823
- update bug report ticket with Active backend field by @felixdittrich92 in #853
- Theme + css #1 by @felixdittrich92 in #856
- docs: Adds illustration in the docstrings of doctr.datasets by @felixdittrich92 in #857
- docs: Updated docstrings of io, transforms & utils by @felixdittrich92 in #859
- docs: Updated folder hierarchy of doc source and nootbooks to rst file by @felixdittrich92 in #862
- Doc models #5 by @felixdittrich92 in #861
- fix: linknet hyperparameters postprocessing + demo for rotation model by @charlesmindee in #865
Miscellaneous
- chore: Applied post release modifications by @fg-mindee in #780
- Switch to new pypdfium2 API by @mara004 in #845
New Contributors
Full Changelog: v0.5.0...v0.5.1