Enhancing Web Automation with Nodriver: An Alternative to Selenium and Playwright #551
-
Discussion Overview:Inspired by this repo 📦 (tap to open) This topic is focused on exploring the unique advantages of Nodriver, a cutting-edge Python package for web scraping and browser automation. By leveraging the CDP directly, it eliminates the reliance on WebDriver, offering a powerful, undetectable, and asynchronous alternative to traditional tools like Selenium and Playwright Chromium. How these features redefine automation workflows and how similar enhancements could benefit the Crawl4AI repository.? Questions
Let me know 🤠 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@prokopis3 Thx for the interest in Crawl4ai and such a good suggestion. Actually, crawl4ai already supports direct CDP communication - we just use Playwright as a thin WebSocket wrapper. Here's what's possible right now: # Direct CDP connection using existing browser
browser_config = BrowserConfig(
use_managed_browser=True, # This enables CDP mode
debugging_port=9222, # Default CDP port
user_data_dir="/path/to/chrome/profile", # Real user profile
headless=False # For real browser
)
# Or let crawl4ai launch browser in CDP mode
browser_config = BrowserConfig(
use_managed_browser=True,
browser_type="chromium",
user_data_dir="~/.config/google-chrome",
debugging_port=9222,
headless=False
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun("https://example.com") Crawl4AI will soon support connecting to remote browsers via CDP, letting you use CDP endpoints anywhere in your network. This will give you:
The key difference from tools like Nodriver is that we're still using real browsers, just communicating with them directly via CDP rather than through WebDriver. By the way, stay tuned for our minimalist browser release (secret) ;) haha - it's going to be a game changer for web scraping and data extraction.! 🚀 |
Beta Was this translation helpful? Give feedback.
@prokopis3 Thx for the interest in Crawl4ai and such a good suggestion. Actually, crawl4ai already supports direct CDP communication - we just use Playwright as a thin WebSocket wrapper. Here's what's possible right now: