You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Google Drive, integration presentations are just blobs of text that, when used as an RAG source, can cause issues with the LLM understanding of what goes together. Visual elements are also lost.
Proposed Solution
Using a vision model to extract text and meaning from each slides visuals.
Alternatives
Additional Context
The text was updated successfully, but these errors were encountered:
It's on our radar that formats like Markdown are much more useful for LLM consumption. Today, we're primarily relying on Apache Tika (either through the Data Extraction Service or through the Elasticsearch Attachment Ingest plugin) to get textual data from non-text file formats, and this is known to lose formatting.
Until we're able to integrate with different tooling, one option you have is to pre-process your files into markdown, and then skip running those through "binary content extraction". This would allow you to preserve your initial formatting.
Problem Description
When using Google Drive, integration presentations are just blobs of text that, when used as an RAG source, can cause issues with the LLM understanding of what goes together. Visual elements are also lost.
Proposed Solution
Using a vision model to extract text and meaning from each slides visuals.
Alternatives
Additional Context
The text was updated successfully, but these errors were encountered: