Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to load certain start_urls in headless mode #311

Open
korbinian-hoermann opened this issue Jan 30, 2025 · 2 comments
Open

Fails to load certain start_urls in headless mode #311

korbinian-hoermann opened this issue Jan 30, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@korbinian-hoermann
Copy link

Hi! I want to collect trajectories of a llm in the internet.
As I am using a cluster for this, i use the headless=True flag.

I initialize my env as:

env = gym.make(
    "browsergym/openended",
    task_kwargs={"start_url": "https://www.reddit.com"},
    wait_for_user_message=False,
    headless=True,
    viewport={"width": viewport_width, "height": viewport_height},
    timeout=20000,
    action_mapping=agent.action_set.to_python_code,
)

While this works for the start_url "https://www.google.com", it fails for reddit or amazon.
When i run the same code locally, with headless=False, it works for all 3 of them.

Is there a specific reason for it or a fix?

This is the output for reddit:

  File "/home/hpc/b232dd/b232dd14/browsergym_playground.py", line 62, in <module>
    obs, info = env.reset()
                ^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/wrappers/common.py", line 400, in reset
    return super().reset(seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/core.py", line 328, in reset
    return self.env.reset(seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/wrappers/common.py", line 293, in reset
    return env_reset_passive_checker(self.env, seed=seed, options=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/gymnasium/utils/passive_env_checker.py", line 185, in env_reset_passive_checker
    result = env.reset(**kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/browsergym/core/env.py", line 303, in reset
    task_goal, task_info = self.task.setup(page=self.page)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/browsergym/core/task.py", line 95, in setup
    page.goto(self.start_url, timeout=10000)
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/sync_api/_generated.py", line 9006, in goto
    self._sync(
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_sync_base.py", line 115, in _sync
    return task.result()
           ^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_page.py", line 551, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_frame.py", line 145, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 61, in send
    return await self._connection.wrap_api_call(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hpc/b232dd/b232dd14/.local/lib/python3.12/site-packages/playwright/_impl/_connection.py", line 528, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
playwright._impl._errors.TimeoutError: Page.goto: Timeout 10000ms exceeded.
Call log:
  - navigating to "https://www.reddit.com/", waiting until "load"
@gasse
Copy link
Collaborator

gasse commented Feb 5, 2025

Hi there. I tried to reproduce on my mac with a fresh conda environment (python=3.13.1), it worked both with headless=True and headless=False

import gymnasium as gym
import browsergym.core

env = gym.make(
    "browsergym/openended",
    task_kwargs={"start_url": "https://www.reddit.com"},
    wait_for_user_message=False,
    headless=False,
    timeout=20000,
)
env.reset()
Image

@gasse
Copy link
Collaborator

gasse commented Feb 5, 2025

Maybe this is a firewall issue on your cluster?

@gasse gasse added the bug Something isn't working label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants