Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEID-2740] add control of ocr options #54

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

yuya-kitani
Copy link
Contributor

@yuya-kitani yuya-kitani commented Nov 10, 2024

What's Changed?

ocr_options handling are added.

PR Checklist

  • Updated unit tests
  • Ran unit tests

@yuya-kitani yuya-kitani marked this pull request as ready for review November 11, 2024 01:43
Copy link
Member

@letmerecall letmerecall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look, do we need to set default ocr option here??

I feel it'll override the default set by deid container.

@@ -512,6 +512,41 @@ def fromdict(cls, values: dict):
)


class OCROptions(BaseRequestObject):
default_ocr_system = "azure_computer_vision"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this is paddleocr in deid 🤔

Do we need to set the default here though? It'll be handled by the pydantic in deid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Girish! in this client, it sets default value separately. Are we okay to swap to paddleocr here?

e.g.
class ProcessedText(ProcessedMarkerText, ProcessedMaskText, ProcessedSyntheticText): default_type = "MARKER" valid_types = ["MARKER", "MASK", "SYNTHETIC"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@letmerecall if the default is paddle then the docs need to be updated.

Yes it'll be handled by pydantic in the deid api but the other models have defaults set so I'd say follow the same pattern.

If down the road we want to give the client a redesign then we can determine how we want to match the container's spec at that point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the defeault OCR. Reference: https://github.com/privateai/deid/blob/acb817f62fe1b0e2dfee9d9b34fe9534713fdb95/app/models/request.py#L619

@a-guiducci which docs? docs.private-ai.com?

IMO defaulting to paddle make sense, as we ship it with the container and user can start with with with least friction.

@yuya-kitani should we bring this up in dev channel/meeting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@letmerecall I have posted a question on dev channel.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya definitely agree that paddle makes more sense

Copy link
Collaborator

@a-guiducci a-guiducci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor request, otherwise looks good if test are passing 🚀

Let's wait until the default ocr system figured out before merging though

@@ -512,6 +512,41 @@ def fromdict(cls, values: dict):
)


class OCROptions(BaseRequestObject):
default_ocr_system = "azure_computer_vision"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@letmerecall if the default is paddle then the docs need to be updated.

Yes it'll be handled by pydantic in the deid api but the other models have defaults set so I'd say follow the same pattern.

If down the road we want to give the client a redesign then we can determine how we want to match the container's spec at that point

def _ocr_system_validator(self, var):
if var not in self.VALID_OCR_SYSTEM:
raise ValueError(
f"OCROptions.ocr_system must be one of {self.VALID_OCR_SYSTEM}, but got {var}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you format this to not include the brackets? ', '.join(self.VALID_OCR_SYSTEM)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-guiducci I addressed the comment and the test passed.

@yuya-kitani
Copy link
Contributor Author

So I updated the default to paddleocr. Can you please re-check? @a-guiducci @letmerecall

@a-guiducci
Copy link
Collaborator

So I updated the default to paddleocr. Can you please re-check? @a-guiducci @letmerecall

Looks good on my end 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants