Google Drive - Presentation to Markdown #3073

that-dom · 2025-01-06T20:13:59Z

Problem Description

When using Google Drive, integration presentations are just blobs of text that, when used as an RAG source, can cause issues with the LLM understanding of what goes together. Visual elements are also lost.

Proposed Solution

Using a vision model to extract text and meaning from each slides visuals.

Alternatives

Additional Context

seanstory · 2025-01-06T20:42:02Z

Hi @that-dom, thanks for filing.

It's on our radar that formats like Markdown are much more useful for LLM consumption. Today, we're primarily relying on Apache Tika (either through the Data Extraction Service or through the Elasticsearch Attachment Ingest plugin) to get textual data from non-text file formats, and this is known to lose formatting.

Until we're able to integrate with different tooling, one option you have is to pre-process your files into markdown, and then skip running those through "binary content extraction". This would allow you to preserve your initial formatting.

that-dom added the enhancement New feature or request label Jan 6, 2025

that-dom mentioned this issue Jan 6, 2025

Feature: Google Drive - Presentation to Markdown / Audio|Video Transcribed (10MB Limit) #3075

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Drive - Presentation to Markdown #3073

Google Drive - Presentation to Markdown #3073

that-dom commented Jan 6, 2025

seanstory commented Jan 6, 2025

Google Drive - Presentation to Markdown #3073

Google Drive - Presentation to Markdown #3073

Comments

that-dom commented Jan 6, 2025

Problem Description

Proposed Solution

Alternatives

Additional Context

seanstory commented Jan 6, 2025