Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnstructuredPDFLoader - No module named 'pdf2image' #29253

Open
5 tasks done
Magnuti opened this issue Jan 16, 2025 · 1 comment
Open
5 tasks done

UnstructuredPDFLoader - No module named 'pdf2image' #29253

Magnuti opened this issue Jan 16, 2025 · 1 comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@Magnuti
Copy link

Magnuti commented Jan 16, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

# main.py
from langchain_community.document_loaders import UnstructuredPDFLoader
loader = UnstructuredPDFLoader("example.pdf")
docs = loader.load()
pip install -U langchain-community unstructured
python3 main.py

Error Message and Stack Trace (if applicable)

File "/Users/.../env/lib/python3.13/site-packages/unstructured/partition/pdf.py", line 25, in <module> import pdf2image
ModuleNotFoundError: No module named 'pdf2image'

Description

I follow this guide on UnstructuredPDFLoader.

I am trying to use UnstructuredPDFLoader but .load() throws the ModuleNotFoundError: No module named 'pdf2image' error.

Expected: Running python3 main.py should work.

Actual: ModuleNotFoundError: No module named 'pdf2image'

System Info

Python: 3.13.0
langchain-community==0.3.14
unstructured==0.11.8

@langcarl langcarl bot added the investigate Flagged for investigation. label Jan 16, 2025
@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 16, 2025
@ccurme ccurme removed the investigate Flagged for investigation. label Jan 16, 2025
@ccurme
Copy link
Collaborator

ccurme commented Jan 16, 2025

You likely need to install "unstructured[pdf]".

I'd suggest using the langchain-unstructured package, as dependencies are managed there (vs. the community integration, which requires separate pip installs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants