[Bug] Crawler reads binary content of ePub #1153

rafalzawadzki · 2025-02-09T21:21:04Z

Describe the Bug
When crawling websites, URLs to files like ePub are also scraped, resulting in garbage results because such formats are not supported by Firecrawl.

To Reproduce
Steps to reproduce the issue:

Run /crawl with url: "https://www.gutenberg.org/ebooks/100"
/crawl/{jobId} and get all pages
Observe that results include content from unparsed files

Expected Behavior
I would expect either an option to exclude files from crawling or proper support for various file formats.

Screenshots

Environment (please complete the following information):

OS: Any
Firecrawl Version: 1.16.0
Node.js Version: 22

The text was updated successfully, but these errors were encountered:

rafalzawadzki added the bug Something isn't working label Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Crawler reads binary content of ePub #1153

[Bug] Crawler reads binary content of ePub #1153

rafalzawadzki commented Feb 9, 2025

[Bug] Crawler reads binary content of ePub #1153

[Bug] Crawler reads binary content of ePub #1153

Comments

rafalzawadzki commented Feb 9, 2025