-
Which package is this bug report for? If unsure which one to select, leave blank@crawlee/browser (BrowserCrawler) Issue descriptionHooks and navigation handlers are called with hard-coded gotoOptions: https://github.com/apify/crawlee/blob/master/packages/browser-crawler/src/internals/browser-crawler.ts#L529 Or this there some hidden magic that I'm missing? Code samplen.A. Package version3.1.0 Node.js versionirrelevant Operating systemirrelevant Apify platform
Priority this issue should haveMedium (should be fixed soon) I have tested this on the
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Workaround is to subclass the crawler class and override with |
Beta Was this translation helpful? Give feedback.
-
Hi! Sorry that we didn't reply earlier. If you are interested in calling The code that you linked to affects navigation, when Crawlee opens the target URL in the browser to allow you to scrape it. What exactly is your use-case that you want to modify const crawler = new PlaywrightCrawler({
requestHandler: router,
headless: true,
preNavigationHooks: [
async ({ request }, gotoOptions) => {
gotoOptions.referer = request.userData.customReferer;
},
],
}); As for the points you mentioned as confusing, you say
Yes, they make sense when navigating - but that is done by Puppeteer/Playwright under the hood. the pre and post-navigation hooks exist to perform some actions before or after that happens - for example, customize the gotoOptions, as I showed above. Does this answer your questions/did you already solve your issues in some other way and can we close this issue? |
Beta Was this translation helpful? Give feedback.
-
There is nothing more to discuss here as far as I am concerned, I moved away from Crawlee anyway. I had failed to understand at the time of writing that the gotoOptions in hooks are reused in the navigation related code.
You closed and locked the issue 😆 |
Beta Was this translation helpful? Give feedback.
Hi! Sorry that we didn't reply earlier.
If you are interested in calling
page.goto(url, options)
in your request handler (= scraping code), that is not affected by Crawlee at all - the call goes directly to Puppeteer/Playwright.The code that you linked to affects navigation, when Crawlee opens the target URL in the browser to allow you to scrape it. What exactly is your use-case that you want to modify
gotoOptions
there? But a cleaner workaround that I can suggest is usingpreNavigationHooks
when creating your Crawler, for example like this: