-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEID-2740] add control of ocr options #54
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a quick look, do we need to set default ocr option here??
I feel it'll override the default set by deid container.
@@ -512,6 +512,41 @@ def fromdict(cls, values: dict): | |||
) | |||
|
|||
|
|||
class OCROptions(BaseRequestObject): | |||
default_ocr_system = "azure_computer_vision" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this is paddleocr
in deid 🤔
Do we need to set the default here though? It'll be handled by the pydantic in deid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Girish! in this client, it sets default value separately. Are we okay to swap to paddleocr
here?
e.g.
class ProcessedText(ProcessedMarkerText, ProcessedMaskText, ProcessedSyntheticText): default_type = "MARKER" valid_types = ["MARKER", "MASK", "SYNTHETIC"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@letmerecall if the default is paddle then the docs need to be updated.
Yes it'll be handled by pydantic in the deid api but the other models have defaults set so I'd say follow the same pattern.
If down the road we want to give the client a redesign then we can determine how we want to match the container's spec at that point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the defeault OCR. Reference: https://github.com/privateai/deid/blob/acb817f62fe1b0e2dfee9d9b34fe9534713fdb95/app/models/request.py#L619
@a-guiducci which docs? docs.private-ai.com?
IMO defaulting to paddle make sense, as we ship it with the container and user can start with with with least friction.
@yuya-kitani should we bring this up in dev channel/meeting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@letmerecall I have posted a question on dev channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya definitely agree that paddle makes more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor request, otherwise looks good if test are passing 🚀
Let's wait until the default ocr system figured out before merging though
@@ -512,6 +512,41 @@ def fromdict(cls, values: dict): | |||
) | |||
|
|||
|
|||
class OCROptions(BaseRequestObject): | |||
default_ocr_system = "azure_computer_vision" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@letmerecall if the default is paddle then the docs need to be updated.
Yes it'll be handled by pydantic in the deid api but the other models have defaults set so I'd say follow the same pattern.
If down the road we want to give the client a redesign then we can determine how we want to match the container's spec at that point
def _ocr_system_validator(self, var): | ||
if var not in self.VALID_OCR_SYSTEM: | ||
raise ValueError( | ||
f"OCROptions.ocr_system must be one of {self.VALID_OCR_SYSTEM}, but got {var}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you format this to not include the brackets? ', '.join(self.VALID_OCR_SYSTEM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-guiducci I addressed the comment and the test passed.
So I updated the default to paddleocr. Can you please re-check? @a-guiducci @letmerecall |
Looks good on my end 👍 |
What's Changed?
ocr_options handling are added.
PR Checklist