Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table image extraction for reference #755

Open
rajuptvs opened this issue Jan 15, 2025 · 1 comment
Open

Table image extraction for reference #755

rajuptvs opened this issue Jan 15, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@rajuptvs
Copy link

Requested feature

Background

Currently, the system supports referencing images using URIs in markdown formatting, which has proven valuable for many data pipeline implementations. For example:

image

Proposed Enhancement

I propose extending this URI reference functionality to table images as well. This addition would provide more flexibility in document handling, particularly in cases where current markdown tables created may not be correct.

Technical Implementation

I've already prototyped a similar functionality using the following approach:

  1. Store image data in item.image and its URI in item.image.uri using the item.get_image()

  2. Implement reference handling through the existing image processing pipeline:

elif image_mode == ImageRefMode.REFERENCED:

    new_doc = self._with_pictures_refs(

        image_dir=artifacts_dir, reference_path=reference_path

    )

I think this would enable extensibility of pipelines using docling and very beneficial to do various kinds of post-processing on table images.

...

Alternatives

...

@rajuptvs rajuptvs added the enhancement New feature or request label Jan 15, 2025
@wcool1
Copy link

wcool1 commented Jan 16, 2025

Hello sir. I wonder that referencing images using URIs in markdown formatting, which has proven valuable for many data pipeline implementations? Why URIs are better than text/table in the output of markdown by OCR? Could you give me some cases or prove?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants