-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to configure with Scrapy CrawlSpider #344
Comments
Closing, read through the docs (which could be organized better) |
Hi, I've got the same issue. I looked into the doc but impossible to find an answer. Please can you help me @villeristi ? How did you solve the problem and can you provide a link to the documentation explaining the issue ? Thanks in advance |
Hi, same issue here. @mautini, did you happen to figure it out already? |
The idea is Scrapy shouldn't be scheduling any links, only parsing and extracting. All the scheduling logic should be implemented in crawling strategy. Example: |
Hi @pdeboer, I finally found a solution. As @sibiryakov mentioned, you must not provide links to Scrapy directly. So start by removing start_urls from your spiders. Next, you must generate a backend to allow frontera to send url to fetch to scrapy. For this purpose, in your Frontera settings, change the backend to Now, generate the database using add_seeds script (step 6 here : https://frontera.readthedocs.io/en/latest/topics/quick-start-single.html?highlight=add%20seeds) You can start the crawler, it should be working ! |
Hi all! I am getting the same error as @villeristi initially: Quick setup explanation: mostly followed the distributed quickstart setup and config, scrapy with frontera and trying to use scrapy-selenium with it. Compliant with @sibiryakov's example, the spider is also just yielding requests in the parse function, however we use the Requests are yielded in the Are we also meant to avoid yielding requests in the start_requests function? Or could it be the SeleniumRequest causing it? Or is there more in configuration/settings that is crucial for this? More details in #401 (opened as I did not find this issue before, happy to move or close) Thanks for all reactions and input! :) |
Instructions on Official documentation about Using the Frontera with Scrapy throws exception with
CrawlSpider
.spider-code:
Exception thrown:
So, how would one use Frontera properly with existing Scrapy-project?
Cheers, this looks definitely awesome!
The text was updated successfully, but these errors were encountered: