You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
try:
result = md.convert(str(pdf_file))
except Exception as e:
log.error(f"MarkItDown conversion failed for {pdf_file.name}: {e}")
print(f"DEBUG: Exception caught in conversion - {e}")
Error:
Traceback (most recent call last):
File "<python_environment>/Lib/site-packages/markitdown/_markitdown.py", line 1239, in _convert
res = converter.convert(local_path, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<python_environment>/Lib/site-packages/markitdown/_markitdown.py", line 490, in convert
text_content = pdfminer.high_level.extract_text(local_path),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<python_environment>/Lib/site-packages/pdfminer/high_level.py", line 176, in extract_text
interpreter.process_page(page)
File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 997, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 1014, in render_contents
self.init_resources(resources)
File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 387, in init_resources
colorspace = get_colorspace(resolve1(spec))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<python_environment>/Lib/site-packages/pdfminer/pdfinterp.py", line 370, in get_colorspace
return PDFColorSpace(name, stream_value(spec[1])["N"])
~~~~~~~~~~~~~~~~~~~~~^^^^^
File "<python_environment>/Lib/site-packages/pdfminer/pdftypes.py", line 263, in __getitem__
return self.attrs[name]
~~~~~~~~~~^^^^^^
KeyError: 'N'
The pdf is corrupted and it's fine that it throws an exception. But it's not getting caught to be handled.
I'm using markitdown = "^0.0.1a3" on python = "^3.11"
The text was updated successfully, but these errors were encountered:
Code:
Error:
The pdf is corrupted and it's fine that it throws an exception. But it's not getting caught to be handled.
I'm using
markitdown = "^0.0.1a3"
onpython = "^3.11"
The text was updated successfully, but these errors were encountered: