Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: PDF to Image Transformation for Comprehensive Image Processing #4656

Open
1 task done
franperic opened this issue Nov 7, 2024 · 1 comment
Open
1 task done
Labels
enhancement New feature or request

Comments

@franperic
Copy link

What features would you like to see added?

I would like the ability to upload PDF files and have them automatically transformed into images. These images should then be processed by the integrated vision models, utilizing their capabilities for both OCR and general image processing. This approach can substantially increase the accuracy and efficiency of handling various content types within PDFs, including text, images, and graphics.

Currently, there is a cumbersome workaround: creating snapshots of PDFs and uploading them directly as images. Processing the PDFs automatically as images would be a huge boost in user experience.

More details

  1. PDF to Image Conversion: Upon uploading a PDF, each page should be converted into an image. This step ensures that all content within the document, including complex layouts, images, and non-selectable text, is accurately captured.

  2. Vision Model Processing: The resulting images would then be attached to the prompt and processed by the existing vision models, leveraging their advanced image understanding capabilities. This will enable the application to handle a wide variety of document types and content layouts very effectively.

Which components are impacted by your request?

General, Endpoints

Pictures

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@franperic franperic added the enhancement New feature or request label Nov 7, 2024
@franperic
Copy link
Author

I believe there are different understandings in this issue: #2755 .
Hence, I created a new issue for the automatic pdf to image transformation.

Furthermore, there is a pull request regarding my feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant