-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ignore_http_error_status_codes and additional_http_error_status_codes arguments to PlaywrightCrawler #953
Comments
In crawler = PlaywrightCrawler(
...,
http_client=HttpxHttpClient(
ignore_http_error_status_codes = {403}
)
) ...which is less bad than touching protected attributes. But still, this does something different than the To handle http status codes received during navigation, we'd have to implement this separately for |
PlaywrightCrawler already handles status codes received during navigation, but in somewhat non-obvious way Where it inherits _is_session_blocked_status_code from BasicCrawler, that looks into (I can even imagine use case where |
I see. Sorry for lying then! I think that this code deserves some serious refactoring, making Also, it looks like this is kinda related to #830.
So navigation would have a different set of "blocked status codes" than |
That is up for discussion. I can create imaginary scenario in my head for it, but maybe it is just a theoretical one and there is no actual need for it. So maybe it is better to make them same initially and separate them only if required by users. |
Let's not deal with that now.
👍 |
Currently arguments that allow to change how different return codes are handled are available only to static http-based crawlers. Those arguments can be used in crawler
__init__
, but are not available inPlaywrightCrawler
. If someone wants to for example ignore 403 error:but in PlaywrightCrawler they have to do something like this:
That is very confusing and users will hardly even know about it. The
PlaywrightCrawler
behavior should be aligned with other crawlers and these should be possible to set in__init__
The text was updated successfully, but these errors were encountered: