You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The library doesn't seem to create any headers when creating markdown from PDF documents. Without a header, there is nothing to delineate sections in the document, which is an important function for LLM chunking.
Any plans to add L1/L2 headers to markitdown?
Alternate implementations for PDF parsing
It seems that PDFPlumber and/or PyMuPDF might have better semantic awareness and might be better at preserving headers and such. Would there be any interest in exploring alternative libraries?
The text was updated successfully, but these errors were encountered:
The library doesn't seem to create any headers when creating markdown from PDF documents. Without a header, there is nothing to delineate sections in the document, which is an important function for LLM chunking.
Any plans to add L1/L2 headers to
markitdown
?Alternate implementations for PDF parsing
It seems that
PDFPlumber
and/orPyMuPDF
might have better semantic awareness and might be better at preserving headers and such. Would there be any interest in exploring alternative libraries?The text was updated successfully, but these errors were encountered: