Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly specify behavior of baseline property in the presence of textangle property #112

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

p12tic
Copy link

@p12tic p12tic commented Apr 21, 2022

This PR fixes an error in the specification related in interaction between baseline and textangle properties.

Currently the baseline property is underspecified: the polyline refers to "coordinate system of the line" which is not defined anywhere else in the document. This makes it unclear how baseline should be specified when textangle is non-zero.

The interpretation that textangle should be ignored would result in error in the specification because completely vertical text would have slope angle equal to positive or negative infinity which can not be represented by the current grammar.

Therefore, textangle should be taken into account. However, it's not clear how it should affect baseline because the specification does not constrain textangle to any specific angle.

This issue is fixed by explicitly specifying what coordinate system would be used for the baseline polynomial in all possible values of textangle property.

Due to this issue tesseract-ocr currently does not output baseline for non-horizontal text at all. Fixing the specification will hopefully allow to output baseline information in all cases.

p12tic added 2 commits April 23, 2022 23:32
Currently the baseline property is underspecified: the polyline refers
to "coordinate system of the line" which is not defined anywhere else in
the document. This makes it unclear how baseline should be specified
when textangle is non-zero.

The interpretation that textangle should be ignored would result in
error in the specification because completely vertical text would have
slope angle equal to positive or negative infinity which can not be
represented by the current grammar.

Therefore, textangle should be taken into account. However, it's not
clear how it should affect baseline because the specification does not
constrain textangle to any specific angle.

This issue is fixed by explicitly specifying what coordinate system will
be used for the baseline polynomial in all possible values of textangle
property.
@p12tic p12tic force-pushed the baseline-textangle branch from 78e0333 to bd774e6 Compare April 23, 2022 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant