Fails to load certain start_urls in headless mode #311

korbinian-hoermann · 2025-01-30T15:34:38Z

Hi! I want to collect trajectories of a llm in the internet.
As I am using a cluster for this, i use the headless=True flag.

I initialize my env as:

env = gym.make(
    "browsergym/openended",
    task_kwargs={"start_url": "https://www.reddit.com"},
    wait_for_user_message=False,
    headless=True,
    viewport={"width": viewport_width, "height": viewport_height},
    timeout=20000,
    action_mapping=agent.action_set.to_python_code,
)

While this works for the start_url "https://www.google.com", it fails for reddit or amazon.
When i run the same code locally, with headless=False, it works for all 3 of them.

Is there a specific reason for it or a fix?

This is the output for reddit:

  File "/home/hpc/b232dd/b232dd14/browsergym_playground.py", line 62, in <module>
    obs, info = env.reset()
                ^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/wrappers/common.py", line 400, in reset
    return super().reset(seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/core.py", line 328, in reset
    return self.env.reset(seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/wrappers/common.py", line 293, in reset
    return env_reset_passive_checker(self.env, seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/utils/passive_env_checker.py", line 185, in env_reset_passive_checker
    result = env.reset(**kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/browsergym/core/env.py", line 303, in reset
    task_goal, task_info = self.task.setup(page=self.page)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/browsergym/core/task.py", line 95, in setup
    page.goto(self.start_url, timeout=10000)
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 9006, in goto
    self._sync(
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 115, in _sync
    return task.result()
           ^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_page.py", line 551, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 145, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 10000ms exceeded.
Call log:
  - navigating to "https://www.reddit.com/", waiting until "load"

The text was updated successfully, but these errors were encountered:

gasse · 2025-02-05T15:31:18Z

Hi there. I tried to reproduce on my mac with a fresh conda environment (python=3.13.1), it worked both with headless=True and headless=False

import gymnasium as gym
import browsergym.core

env = gym.make(
    "browsergym/openended",
    task_kwargs={"start_url": "https://www.reddit.com"},
    wait_for_user_message=False,
    headless=False,
    timeout=20000,
)
env.reset()

gasse · 2025-02-05T15:32:16Z

Maybe this is a firewall issue on your cluster?

gasse added the bug Something isn't working label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails to load certain start_urls in headless mode #311

Fails to load certain start_urls in headless mode #311

korbinian-hoermann commented Jan 30, 2025

gasse commented Feb 5, 2025

gasse commented Feb 5, 2025

Fails to load certain start_urls in headless mode #311

Fails to load certain start_urls in headless mode #311

Comments

korbinian-hoermann commented Jan 30, 2025

gasse commented Feb 5, 2025

gasse commented Feb 5, 2025