How can I send a PDF document like a invoice get its metadata? #199

monkiki · 2024-10-15T10:38:00Z

monkiki
Oct 15, 2024

I've working in the past with the ChatRequest to send and image to get the extracted text and works great, but when I try to send a PDF there is an error because only allows images: Invalid MIME type. Only image types are supported

Indeed, the ContentPart class only seems to work with images, according to the ContentPartImageUrl method.

ChatRequest chatRequest = ChatRequest.builder()
	.model("gpt-4o")
	.messages(List.of(
		ChatMessage.UserMessage.of(
			List.of(
				ContentPart.ContentPartText.of("extract metadata from this invoice"),
				ContentPart.ContentPartImageUrl.of(loadImageAsBase64(mimeType, content))
			)
		)
	))
	.responseFormat(ResponseFormat.jsonSchema(ResponseFormat.JsonSchema.builder()
		.name(InvoiceMetadata.class.getSimpleName())
		.schemaClass(InvoiceMetadata.class)
		.build()))
	.build();

Any idea?

Answered by sashirestela

Oct 15, 2024

Hi @monkiki, thanks for using simple-openai!

You are talking about the Vision functionality, which is part of the Chat Completion API. According to OpenAI documentation, Vision supports the following type of files only: png, jpg, jpeg, webp, gif. So, you cannot use PDF files directly, however you could convert them (using external tools) to image files.

The other option could be to use the Assistant API with File Search Tool, which accepts PDF files and you can ask questions about the content of those PDF files. I invite you to read the documentation about it and use simple-openai for that purpose.

View full answer

sashirestela · 2024-10-15T20:31:26Z

sashirestela
Oct 15, 2024
Maintainer

Hi @monkiki, thanks for using simple-openai!

You are talking about the Vision functionality, which is part of the Chat Completion API. According to OpenAI documentation, Vision supports the following type of files only: png, jpg, jpeg, webp, gif. So, you cannot use PDF files directly, however you could convert them (using external tools) to image files.

The other option could be to use the Assistant API with File Search Tool, which accepts PDF files and you can ask questions about the content of those PDF files. I invite you to read the documentation about it and use simple-openai for that purpose.

1 reply

monkiki Oct 16, 2024
Author

Ok, thanks for the info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I send a PDF document like a invoice get its metadata? #199

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How can I send a PDF document like a invoice get its metadata? #199

monkiki Oct 15, 2024

Replies: 1 comment · 1 reply

sashirestela Oct 15, 2024 Maintainer

monkiki Oct 16, 2024 Author

monkiki
Oct 15, 2024

Replies: 1 comment 1 reply

sashirestela
Oct 15, 2024
Maintainer

monkiki Oct 16, 2024
Author