Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflows: recommend parameters and recipes #104

Open
Tracked by #337
bertsky opened this issue May 12, 2020 · 8 comments
Open
Tracked by #337

workflows: recommend parameters and recipes #104

bertsky opened this issue May 12, 2020 · 8 comments
Assignees

Comments

@bertsky
Copy link
Collaborator

bertsky commented May 12, 2020

Sometimes a word on parameter choices would be helpful. For example,

  • threshold (ocrd-cis-ocropy-binarize) or k (ocrd-olena-binarize) parameter for binarization,
  • maxskew (ocrd-cis-ocropy-deskew) angle,
  • find_tables (ocrd-tesserocr-segment-region)
  • padding (in ocrd-tesserocr-segment-region, ocrd-tesserocr-recognize) or range (in ocrd-cis-ocropy-dewarp)
  • ...

Also, beyond full-blown workflow recommendations, simple recipes could be discussed, like:

  • for deskewing, combine ocrd-tesserocr-deskew (which includes orientation) with ocrd-cis-ocropy-deskew (which is more accurate w.r.t. skew and can run afterwards)
  • for binarization, combine certain noisy methods with binary denoising afterwards (like ocrd-cis-ocropy-denoise)
  • for binarization, combine certain sensitive methods with raw denoising beforehand (like ocrd-im6convert with wavelet-denoise)
  • for page segmentation, combine processors without reading order with incremental/RO-only processors (like ocrd-cis-ocropy-segment)
  • ...
@EEngl52
Copy link
Collaborator

EEngl52 commented May 20, 2020

this is an important issue for sure, thanks for pointing it out @bertsky ! I suggest we will continuously amend our user guide with such pieces of information just as we gain further experience with parameter and processor choices. The pilot phase might come in quite handy in this respect

@bertsky
Copy link
Collaborator Author

bertsky commented May 20, 2020

@EEngl52 you mean the workflow guide, don't you?

If you make a draft PR with how and where to put such info (i.e. parameters and recipes), then I (or others) can comment/review. Or you wait for others to write about their experiences in the wiki. Maybe we should also discuss this in the VC next week. After all, we ideally want to spawn a user-oriented discussion!

@EEngl52
Copy link
Collaborator

EEngl52 commented Jun 4, 2020

I added some template pages to the wiki where such detailed recommendations can be added.

@EEngl52
Copy link
Collaborator

EEngl52 commented Jul 9, 2020

as we decided to add more in-depth recommendations to the Website Wiki: can we close this issue @bertsky ? Or do you want to keep it as a reminder for your first ideas?

@bertsky
Copy link
Collaborator Author

bertsky commented Jul 9, 2020

Yes, good idea. I'll try to add the info somewhere in the wiki, and then close here. Hopefully it gets integrated into the workflow recommendations at some point (probably after having a good working evaluation).

@kba
Copy link
Member

kba commented Apr 25, 2023

@bertsky Is this adressed by the workflow-guide-from-wiki mechanism and the Notes sections there?

@bertsky
Copy link
Collaborator Author

bertsky commented Apr 25, 2023

Not quite. The original idea was to provide some middle ground between single processor description and full-blown workflows: simple reusable recipes for special tasks (e.g. how to segment handwriting, how to detect and segment tables, how to do multi-OCR alignment, how to do OCR model selection, how to do cropping with or without facing pages, when and how to do deskewing and dewarping, how to combine segmentation from various tools, how to extract training data suitable for segmentation or for OCR). I'm afraid we don't have that yet, despite some supplementary pages in the Wiki.

Perhaps we can keep this open for our current effort to collect workflow experiences, and try to work this into the WF Guide and (if necessary) additional pages (which we could then link to on the website)?

@kba
Copy link
Member

kba commented Apr 26, 2023

Perhaps we should revisit the original idea of the "OCR-D cookbook" with "recipes" for common tasks and problems like the ones you mentioned. The user guide is too high-level for those, the workflow guide too low-level.

Perhaps it would make sense to combine this with the FAQ (cf. #32). We could have a docu sprint where we collect and answer common questions and based on that decide which question complexes merit a more in-depth analysis? And perhaps have those documents (FAQ and cookbook) live in the Wiki with integration into the website, like the workflows.md mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants