You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even though, the OCR layer in the PDF might differ, it might be useful to also change the OCR content. For example, a good LLM (prompt), might be able to use (all) the found OCR content and fixes some artifacts (e.g. add forgotten/broken Umlaute, Whitespace positioning, grammar/syntax correction, etc. This works very well. The new output can be added or changed in the paperless content database. You could also use an approach such as RAPTOR, which summarizes the content all together but this might be contraproductive. Paperless AI should also be very "time/date" aware to enhance extraction and handling (due) dates, etc.
Thank you for considering and developing this awesome tool :)
The text was updated successfully, but these errors were encountered:
Even though, the OCR layer in the PDF might differ, it might be useful to also change the OCR content. For example, a good LLM (prompt), might be able to use (all) the found OCR content and fixes some artifacts (e.g. add forgotten/broken Umlaute, Whitespace positioning, grammar/syntax correction, etc. This works very well. The new output can be added or changed in the paperless content database. You could also use an approach such as RAPTOR, which summarizes the content all together but this might be contraproductive. Paperless AI should also be very "time/date" aware to enhance extraction and handling (due) dates, etc.
Thank you for considering and developing this awesome tool :)
The text was updated successfully, but these errors were encountered: