Fix OCR and add summaries to content. #251

Snify89 · 2025-01-27T19:12:51Z

Even though, the OCR layer in the PDF might differ, it might be useful to also change the OCR content. For example, a good LLM (prompt), might be able to use (all) the found OCR content and fixes some artifacts (e.g. add forgotten/broken Umlaute, Whitespace positioning, grammar/syntax correction, etc. This works very well. The new output can be added or changed in the paperless content database. You could also use an approach such as RAPTOR, which summarizes the content all together but this might be contraproductive. Paperless AI should also be very "time/date" aware to enhance extraction and handling (due) dates, etc.

Thank you for considering and developing this awesome tool :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OCR and add summaries to content. #251

Fix OCR and add summaries to content. #251

Snify89 commented Jan 27, 2025

Fix OCR and add summaries to content. #251

Fix OCR and add summaries to content. #251

Comments

Snify89 commented Jan 27, 2025