PDF L1/L2 headers? #304

aaronsteers · 2025-01-25T07:49:38Z

The library doesn't seem to create any headers when creating markdown from PDF documents. Without a header, there is nothing to delineate sections in the document, which is an important function for LLM chunking.

Any plans to add L1/L2 headers to markitdown?

Alternate implementations for PDF parsing

It seems that PDFPlumber and/or PyMuPDF might have better semantic awareness and might be better at preserving headers and such. Would there be any interest in exploring alternative libraries?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF L1/L2 headers? #304

PDF L1/L2 headers? #304

aaronsteers commented Jan 25, 2025 •

edited

Loading

PDF L1/L2 headers? #304

PDF L1/L2 headers? #304

Comments

aaronsteers commented Jan 25, 2025 • edited Loading

Alternate implementations for PDF parsing

aaronsteers commented Jan 25, 2025 •

edited

Loading