Best Approach to Assigning User Agent and Proxy Based on Data List in Crawlee? #2801
Unanswered
ferdysopian
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm working on a scraping task where I have a list of workers, each with associated cookies, user agent, and proxy. I want each worker to scrape up to 5 URLs.
Current Problem:
I can currently set the user agent and proxy globally, but I need to dynamically assign the appropriate user agent and proxy to each worker based on their specific configuration. Rather than defining RequestQueue, ProxyConfiguration, Configuration, and PlaywrightCrawler separately for each worker, I would like to define them once and then retrieve the specific information (such as user agent and proxy) from request.userData for each request. How can I achieve this?
My Current Approach:
I'm looping through the list of workers, assigning 5 URLs per worker. Here’s my current example code:
Question:
Is there a better or more efficient way to assign user agents and proxies for each worker dynamically, instead of setting them globally or using the current worker task-loop approach?
Maybe something like this:
Is there a better approach to implement dynamic handling of proxies and user agents based on workers or requests? I'm open to suggestions!
Beta Was this translation helpful? Give feedback.
All reactions