Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElementHandle fails to find node by id #75

Open
pixeleet opened this issue Jun 12, 2024 · 4 comments
Open

ElementHandle fails to find node by id #75

pixeleet opened this issue Jun 12, 2024 · 4 comments
Labels
question Further information is requested

Comments

@pixeleet
Copy link
Contributor

Deno v.1.44.1
Astral 0.4.2

deno run -A --unstable playground/elementhandle_bug.ts
⚠️  The `--unstable` flag is deprecated and will be removed in Deno 2.0. Use granular `--unstable-*` flags instead.
Learn more at: https://docs.deno.com/runtime/manual/tools/unstable_flags
clicked
loaded
error: Uncaught (in promise) Error: Unable to get stable box model to click on
    if (!model) throw new Error("Unable to get stable box model to click on");
                      ^
    at ElementHandle.click (https://jsr.io/@astral/astral/0.4.2/src/element_handle.ts:180:23)
    at eventLoopTick (ext:core/01_core.js:168:7)
    at async file:///Users/DP/workspace/github.com/VojAI/etl/playground/elementhandle_bug.ts:32:7
    at async file:///Users/DP/workspace/github.com/VojAI/etl/playground/elementhandle_bug.ts:24:16

reproduction:

import { ElementHandle, launch } from "jsr:@astral/[email protected]";

const browser = await launch({ headless: false });
const page = await browser.newPage();

await page.goto("https://www.google.com/search?true&source=lnms&tbm=isch&sa=X&tbs=isz:l&hl=en&q=Barceloneta%2C+Beach");

await page.waitForSelector('div[role="dialog"]', { timeout: 30_000 });

const [accept_cookies] = await page.$$("div[role='dialog'] button")
  .then((buttons) =>
    Promise.all(
      buttons.map((b, i) => b.innerHTML().then((it): [ElementHandle, string] => [buttons[i], it])),
    )
  )
  .then((it) => it.filter(([_, html]) => html.toLowerCase().includes("accept")))
  .then((it) => it.map(([el, _]) => el));

if (accept_cookies) {
  await accept_cookies?.click?.();
  await page.waitForNavigation({ waitUntil: "none" });
}

const images = await page.$$("g-img")
  .then((els) => Promise.all(els.map((el) => Promise.all([el, el.evaluate((it: HTMLElement) => !!it.className)]))))
  .then((it) => it.filter(([_, className]) => className))
  .then((it) => it.slice(0, 20))
  .then((it) => it.map(([el]) => el))
  .then(async (els) => {
    const imgs = [];
    for (const el of els) {
      await el.click();
      console.log("clicked");
      await page.waitForNavigation({ waitUntil: "none" });
      console.log("loaded");
      await page.waitForSelector('img[aria-hidden="false"]', { timeout: 30_000 });
      console.log("getting image");
      const image = await page.evaluate(() => {
        const img = document.querySelector('img[aria-hidden="false"]')! as HTMLImageElement;
        if (img) {
          return {
            title: img.closest("c-wiz")?.querySelector("h1")?.innerText!,
            url: img.src,
            source: (img.parentElement as HTMLLinkElement).href,
          };
        }
        throw new Error("Image not found");
      });
      console.log("got image", image);
      imgs.push(image);
    }
  });

console.log(images);

Happy to open a PR given some pointers what to fix.

@lino-levan
Copy link
Owner

Okay I sat down and investigated and I think this is sort of a logic issue with this code. Something that's a bit unfortunate about element handles the way they work right now is that if the page transforms, it's likely that those handles are completely different. Essentially, here when you call el.click(), the page reconfigures itself and previous references become invalid. There is a universe that with locators this wouldn't happen but the locator API isn't fully fleshed out yet so I won't recommend it for now.

The correct way of implementing this would actually be requesting the nth child of some parent div in a while loop. Something like this should work just fine:

import { ElementHandle, launch } from "jsr:@astral/[email protected]";

const browser = await launch({ headless: false });
const page = await browser.newPage();

await page.goto("https://www.google.com/search?true&source=lnms&tbm=isch&sa=X&tbs=isz:l&hl=en&q=Barceloneta%2C+Beach");

await page.waitForSelector('div[role="dialog"]', { timeout: 30_000 });

const [accept_cookies] = await page.$$("div[role='dialog'] button")
  .then((buttons) =>
    Promise.all(
      buttons.map((b, i) => b.innerHTML().then((it): [ElementHandle, string] => [buttons[i], it])),
    )
  )
  .then((it) => it.filter(([_, html]) => html.toLowerCase().includes("accept")))
  .then((it) => it.map(([el, _]) => el));

if (accept_cookies) {
  await accept_cookies?.click?.();
  await page.waitForNavigation({ waitUntil: "none" });
}

const images = [];
let i = 0;
while(true) {
  const pageImages = await page.$$("g-img");
  const cur = pageImages[i++];
  const className = await cur.evaluate((it: HTMLElement) => !!it.className);
  if (!className) continue;
  await cur.click();
  await page.waitForNavigation({ waitUntil: "none" });
  await page.waitForSelector('img[aria-hidden="false"]', { timeout: 30_000 });
  const image = await page.evaluate(() => {
    const img = document.querySelector('img[aria-hidden="false"]')! as HTMLImageElement;
    if (img) {
      return {
        title: img.closest("c-wiz")?.querySelector("h1")?.innerText!,
        url: img.src,
        source: (img.parentElement as HTMLLinkElement).href,
      };
    }
    throw new Error("Image not found");
  });
  images.push(image);

  if(images.length === 20) {
    break;
  }
}

console.log(images);

@lino-levan lino-levan added the question Further information is requested label Jun 12, 2024
@pixeleet
Copy link
Contributor Author

Going to give it a try and let you know if this approach worked out, thank you so much for your response.

@pixeleet
Copy link
Contributor Author

This code has essentially been moved from puppeteer to astral, so while this works, for this specific script, I bet we're breaking a lot of unwritten rules / expectations when element handles are lost in page transitions (if I understand correctly)

@lino-levan
Copy link
Owner

lino-levan commented Jun 12, 2024

We're trying to move everything to locators (#2) ASAP, so hopefully this will not be a problem soon. Puppeteer has these unwritten rules but they do a lot of work to make them not show up too often. Unfortunately, when they do show up, it's basically impossible to debug due to the weird hacks they use.

TLDR; locators fix this elegantly, they're just not quite ready yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants