You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the Bug
When crawling websites, URLs to files like ePub are also scraped, resulting in garbage results because such formats are not supported by Firecrawl.
To Reproduce
Steps to reproduce the issue:
Run /crawl with url: "https://www.gutenberg.org/ebooks/100"
/crawl/{jobId} and get all pages
Observe that results include content from unparsed files
Expected Behavior
I would expect either an option to exclude files from crawling or proper support for various file formats.
Screenshots
Environment (please complete the following information):
OS: Any
Firecrawl Version: 1.16.0
Node.js Version: 22
The text was updated successfully, but these errors were encountered:
Describe the Bug
When crawling websites, URLs to files like ePub are also scraped, resulting in garbage results because such formats are not supported by Firecrawl.
To Reproduce
Steps to reproduce the issue:
/crawl
withurl: "https://www.gutenberg.org/ebooks/100"
/crawl/{jobId}
and get all pagesExpected Behavior
I would expect either an option to exclude files from crawling or proper support for various file formats.
Screenshots
![Image](https://private-user-images.githubusercontent.com/10667346/411358466-2c7cace6-1cec-4f83-bff9-d8e642fcf01f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1MzI1NDIsIm5iZiI6MTczOTUzMjI0MiwicGF0aCI6Ii8xMDY2NzM0Ni80MTEzNTg0NjYtMmM3Y2FjZTYtMWNlYy00ZjgzLWJmZjktZDhlNjQyZmNmMDFmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDExMjQwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTJhZDlmZDQ4MGU3ZTgxMzZkNDQ0NWUxYWQwMjg3OWVmNGMyNTE4MGU3ZjQ3YTMzNzYwMDgzYjU4OGNkYjE2NzQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0._KIh5ks5M_H_3YJdSf7HHzNsHkfrcKxmWaNk7UfMrwU)
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: