[Bug]: CORS (Cross-Origin Resource Sharing) error when trying to use Crawl4AI to connect to Twitter. #647
-
crawl4ai versionVersion: 0.4.248 Expected BehaviorI am encountering a CORS (Cross-Origin Resource Sharing) error when trying to use Crawl4AI to connect to Twitter. Crawl4AI is failing to load essential scripts from Twitter's domain (abs.twimg.com), which is preventing proper connection. Here are the console error messages I am consistently seeing in the logs: CONSOLE]. ℹ Console: Access to script at 'https://abs.twimg.com/responsive-web/client-web/vendor.c4b9145a.js' from origin 'https://twitter.com' has been blocked by CORS policy: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. Current Behaviorprogram stops with error logs Is this reproducible?Yes Inputs Causing the Bug-URLS : https://www.x.com
NB : i used so many configurations, i prefer send the minimalist config version. Steps to Reproducelaunch the python code on terminal and look at the console. Code snippetssite_url = "https://www.x.com"
import asyncio
import nest_asyncio
from crawl4ai import AsyncWebCrawler, CacheMode, BrowserConfig, CrawlerRunConfig
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
import logging
# Configuration du logging
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger('twitter_crawler')
# Apply nest_asyncio to allow nested event loops
nest_asyncio.apply()
async def main():
# Configuration optimisée pour Twitter
browser_conf = BrowserConfig(
headless=False, # Mode visible pour le debug
)
crawler_config = CrawlerRunConfig(
cache_mode=CacheMode.DISABLED, # cache_mode=CacheMode.BYPASS/DISABLED,
log_console=True, # Activation des logs console
)
try:
async with AsyncWebCrawler(
config=browser_conf,
verbose=True,
) as crawler:
result = await crawler.arun(
url=site_url ,
config=crawler_config
)
if result.success:
logger.info("Longueur du HTML capturé: %d", len(result.html or ''))
except Exception as e:
logger.error(f"Erreur générale: {str(e)}")
raise e
if __name__ == "__main__":
asyncio.run(main()) OSwindow Python versionPython 3.11.9 Browserchromium Browser versionersion 133.0.6943.16 (Build officiel) (64 bits) Error logs & Screenshots (if applicable) |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
The same problem occurs while crawling Facebook. |
Beta Was this translation helpful? Give feedback.
-
Checkout: microsoft/playwright#17631 |
Beta Was this translation helpful? Give feedback.
-
@alvaro562003 Like @Tauvic suggested, we should add '--disable-web-security' to ignore CORS errors. You can pass this flag through browser_conf = BrowserConfig(
headless=False,
extra_args=['--disable-web-security']
) Converting this to Forums, so that others may find this information easily. |
Beta Was this translation helpful? Give feedback.
-
Hi aravindkarnam, |
Beta Was this translation helpful? Give feedback.
@alvaro562003 Like @Tauvic suggested, we should add '--disable-web-security' to ignore CORS errors. You can pass this flag through
extra_args
key inBrowserConfig
as follows:Converting this to Forums, so that others may find this information easily.