Enhancing Web Automation with Nodriver: An Alternative to Selenium and Playwright #551

prokopis3 · 2025-01-23T04:16:45Z

prokopis3
Jan 23, 2025

Discussion Overview:

Inspired by this repo 📦 (tap to open)

This topic is focused on exploring the unique advantages of Nodriver, a cutting-edge Python package for web scraping and browser automation. By leveraging the CDP directly, it eliminates the reliance on WebDriver, offering a powerful, undetectable, and asynchronous alternative to traditional tools like Selenium and Playwright Chromium. How these features redefine automation workflows and how similar enhancements could benefit the Crawl4AI repository.?

Questions

Can we stop using playwright or selenium entirely? No more driver no more selenium even better resistance against web application firewalls, while performance gets a massive boost
Can we design an intuitive API layer inspired by Nodriver’s tab.find() for element interactions?
Should Crawl4AI incorporate a smart lookup system with text for enhanced flexibility?
Could Crawl4AI incorporate CDP-based undetectable scraping to bypass bot-detection systems?

Let me know 🤠

Answered by unclecode

Jan 24, 2025

@prokopis3 Thx for the interest in Crawl4ai and such a good suggestion. Actually, crawl4ai already supports direct CDP communication - we just use Playwright as a thin WebSocket wrapper. Here's what's possible right now:

# Direct CDP connection using existing browser 
browser_config = BrowserConfig(
    use_managed_browser=True,  # This enables CDP mode
    debugging_port=9222,  # Default CDP port
    user_data_dir="/path/to/chrome/profile",  # Real user profile
    headless=False  # For real browser 
)

# Or let crawl4ai launch browser in CDP mode
browser_config = BrowserConfig(
    use_managed_browser=True,
    browser_type="chromium",
    user_data_dir="~/.config/google-chrome",
    de…

View full answer

unclecode · 2025-01-24T06:20:33Z

unclecode
Jan 24, 2025
Maintainer

@prokopis3 Thx for the interest in Crawl4ai and such a good suggestion. Actually, crawl4ai already supports direct CDP communication - we just use Playwright as a thin WebSocket wrapper. Here's what's possible right now:

# Direct CDP connection using existing browser 
browser_config = BrowserConfig(
    use_managed_browser=True,  # This enables CDP mode
    debugging_port=9222,  # Default CDP port
    user_data_dir="/path/to/chrome/profile",  # Real user profile
    headless=False  # For real browser 
)

# Or let crawl4ai launch browser in CDP mode
browser_config = BrowserConfig(
    use_managed_browser=True,
    browser_type="chromium",
    user_data_dir="~/.config/google-chrome",
    debugging_port=9222,
    headless=False
)

async with AsyncWebCrawler(config=browser_config) as crawler:
    result = await crawler.arun("https://example.com")

Crawl4AI will soon support connecting to remote browsers via CDP, letting you use CDP endpoints anywhere in your network. This will give you:

Connect to any remote CDP-enabled browser
Full proxy support
Custom launch configs
Remote browser management

The key difference from tools like Nodriver is that we're still using real browsers, just communicating with them directly via CDP rather than through WebDriver.

By the way, stay tuned for our minimalist browser release (secret) ;) haha - it's going to be a game changer for web scraping and data extraction.! 🚀

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing Web Automation with Nodriver: An Alternative to Selenium and Playwright #551

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Enhancing Web Automation with Nodriver: An Alternative to Selenium and Playwright #551

prokopis3 Jan 23, 2025

Discussion Overview:

Questions

Replies: 1 comment

unclecode Jan 24, 2025 Maintainer

prokopis3
Jan 23, 2025

unclecode
Jan 24, 2025
Maintainer