Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameter labels in export_to_markdown() does not work #112

Closed
adder opened this issue Dec 17, 2024 · 3 comments · Fixed by #113
Closed

parameter labels in export_to_markdown() does not work #112

adder opened this issue Dec 17, 2024 · 3 comments · Fixed by #113

Comments

@adder
Copy link

adder commented Dec 17, 2024

I want to export selected parts from the docling document to markdown. (Titles and paragraphs, but NO footers, headers, ...)

I wanted to do this by calling doc.export_to_markdown(labels = {"title","paragraph"})

But this does not work. eg. there are still tables returned but no paragraphs.

@dolfim-ibm
Copy link
Contributor

If you simply want to skip headers and footers, nothing should be needed, because it is the default behavior.

It is indeed true that the labels filter is currently used only for "simple text", and other structures are included. We should fix it.

If are looking for a way to discard the more complex structures (figures, tables, etc) you can also use the parameter strict_text=True.

@adder
Copy link
Author

adder commented Dec 17, 2024 via email

@dolfim-ibm
Copy link
Contributor

Here is a small fix for it #113.

On the other hand, the case you describe looks more a false prediction of the layout model, which is responsible to identify the page footers and headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants