Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Not in cart, trying again" #28

Open
Diablozone opened this issue Jun 8, 2021 · 56 comments
Open

"Not in cart, trying again" #28

Diablozone opened this issue Jun 8, 2021 · 56 comments
Labels
bug Something isn't working

Comments

@Diablozone
Copy link

I am running this on windows without docker, and I got the bot to work. it opens up the browser, selects the size and then gets stuck in an endless loop saying "not in cart, trying again".
I am trying to test run on the nike website and in the browser it gives me an error saying

"We had an issue with your request. If you continue experiencing issues, try refreshing the page.

[ Code: 9F502A89 ]

Thanks in advance...

@samc621
Copy link
Owner

samc621 commented Jun 8, 2021

Hi @Diablozone, thanks for opening this issue. I have seen this error before, however, I've seen the product also add to cart (and it continue to checkout) in the presence of this error. Are you seeing the same?

@Diablozone
Copy link
Author

no the product doesnt add to cart. should I keep trying again and again?

@Diablozone
Copy link
Author

what happens is that the page keeps refreshing and this error keeps looping. eventually the chromium tab closes automatically

@Diablozone
Copy link
Author

can you give some insight into why this error occurs or which part of the code its coming from so that I may be able to look into it further

@samc621
Copy link
Owner

samc621 commented Jun 9, 2021

@Diablozone seems I am seeing the same behaviour, now. There is a 429 error coming from the API when clicking any of the DOM elements for style, size, or ATC button. I am going to need to look into this further and will circle back on it ASAP.

@Diablozone
Copy link
Author

Diablozone commented Jun 9, 2021 via email

@samc621
Copy link
Owner

samc621 commented Jun 10, 2021

@Diablozone just updating this here, the error on the nike integration appears to be a CORS issue. You will see some of the requests failing in the Chrome devtools network tab. I've started looking into it but haven't gotten to the bottom of it, yet. It just started happening afaik.

@Diablozone
Copy link
Author

Diablozone commented Jun 10, 2021 via email

@samc621
Copy link
Owner

samc621 commented Jun 15, 2021

@Diablozone its definitely worth a try.

@Diablozone
Copy link
Author

Diablozone commented Jun 15, 2021 via email

@samc621
Copy link
Owner

samc621 commented Jun 16, 2021

@Diablozone can you provide a sample Task object for me to test with? I was recently testing on Kith and some others and it was working fine.

@labboy0276
Copy link
Contributor

@samc621 I am seeing this with your API example:

{
    "site_id": 1,
    "url": "https://www.nike.com/t/lebron-18-black-white-basketball-shoe-M6DgN2/CQ9283-100",
    "style_index": 1,
    "size": "8.5",
    "shipping_speed_index": 0,
    "billing_address_id": 1,
    "shipping_address_id": 1,
    "notification_email_address": "myemail"
}

Side question, what is style_index and is it just the style like Best 10-18 or Best 1-9 in the example url above?

@samc621
Copy link
Owner

samc621 commented Jul 6, 2021

Hi @labboy0276, yes I noted this issue a couple of weeks ago. I believe that it is a CORS error, and I haven't gotten around to addressing it yet.

@samc621
Copy link
Owner

samc621 commented Jul 10, 2021

@labboy0276 just copying this over here for reference:
https://user-images.githubusercontent.com/38767335/125101214-fa64c380-e0a7-11eb-92dd-cbdb50f2e127.png

@labboy0276
Copy link
Contributor

I am going to throw this here, but haven't tried it yet.

Have you thought about using a header like this @samc621

        headers: {
            'user-agent': userAgent,
            'sec-fetch-dest': 'none',
            'accept': '*/*',
            'sec-fetch-site': 'cross-site',
            'sec-fetch-mode': 'cors',
            'accept-language': 'en-US'
        }.

I have seen this in other bots for scrapping the interwebz.

@samc621
Copy link
Owner

samc621 commented Jul 17, 2021

@labboy0276 have you tested this with the Nike issue? Curious if it might help.

@labboy0276
Copy link
Contributor

negative @samc621 just putting it there as I havent had time to test it out yet. If someone else can, that would be helpful.

@samc621
Copy link
Owner

samc621 commented Jul 18, 2021

@labboy0276 I added this:

await page.setExtraHTTPHeaders({
          'user-agent': `${userAgent}`,
          'sec-fetch-dest': 'none',
          accept: '*/*',
          'sec-fetch-site': 'cross-site',
          'sec-fetch-mode': 'cors',
          'accept-language': 'en-US'
});

But it doesn't seem to be doing the trick. I took a look in the console and it looks like a 429 error (blank response) on this API.
Screen Shot 2021-07-18 at 9 20 26 AM
Screen Shot 2021-07-18 at 9 20 43 AM

@samc621
Copy link
Owner

samc621 commented Jul 18, 2021

There is a new captcha on the footsites, I've never repro'd it in the browser but I'm curious if this might help reduce bot detection so that we don't hit it there. I'll give it a go in a bit.

@samc621
Copy link
Owner

samc621 commented Jul 18, 2021

So it looks like, testing with these headers on the footsites, I got blocked. It took me to a Terms of Service page. Testing without them, I hit the captcha I was expecting.

All in all, I don't think setting these headers is helping much. I will need another approach to both the Nike issue and the footsites captcha.

@Kohlsen
Copy link

Kohlsen commented Aug 4, 2021

Any update on this? @samc621

@Kohlsen
Copy link

Kohlsen commented Aug 4, 2021

I did some research on the 429 status code error and from what I got out of it, a 429 is a rate limiter error. Meaning the user sent too many requests at once. Could this be fixed by adding some timeouts to puppeteer?

@samc621
Copy link
Owner

samc621 commented Aug 4, 2021

@Kohlsen yes, that is indeed the common meaning for a 429 response code. I have tried that, among numerous other things, but to no avail. I keep looking at it when I get a chance, and I think others are too. Let me know if you get anything to work!

This was referenced Aug 8, 2021
@jhgeluk
Copy link

jhgeluk commented Sep 21, 2021

I'm also experiencing this issue on nike.com. When replicating the "clickthrough/search" behaviour on my own browser it doesn't occur

@samc621
Copy link
Owner

samc621 commented Sep 21, 2021

@jhgeluk yes, I think it has something to do with the browser fingerprint and/or the speed of the navigations. I have some potential solutions for this, including:

  1. Randomizing all of the window.navigator properties.
  2. Using puppeteer to emit mousemovement events when clicking a DOM node.
  3. Using more delay in between certain actions.

I just haven't had time to implement this, but if someone is willing to give it a try, please feel free!

@samc621
Copy link
Owner

samc621 commented Sep 23, 2021

@13ROY could you take a look at his when you get a chance?

@samc621
Copy link
Owner

samc621 commented Sep 30, 2021

@jhgeluk yes that makes a lot of sense, can you try to identify what libraries Nike might be using? That way we can reverse engineer to a solution. This is commonly how I solve issues like this.

@samc621 samc621 added the bug Something isn't working label Apr 1, 2022
@abhingupta
Copy link

Hi, I've been trying to resolve this but to no avail. I've tried inputting custom headers and disabling web security for now. Sam, any thoughts on how we can fix this?

@samc621
Copy link
Owner

samc621 commented Apr 12, 2022

@abhingupta there's a lot of things we can try here:

We are already randomizing the user agent but I would try Playwright with another browser like Firefox. I would try adding delay between the interactions. I would try to make sure that the browser emits mouse movement and click down events. I would check the network requests tab to see if there any requests which we are failing (other than the 429 error). I would evaluate their JS code to look for any kind of fingerprinting library (if so, we must figure out how to emulate or reverse engineer it).

These are starting points. There's more we can do from here.

@bklynate
Copy link
Contributor

bklynate commented Apr 16, 2022

Screen Shot 2022-04-16 at 12 15 31 PM

Is this error related to this bug as well?

Here is the task I am using to get this bug...

{
    "site_id": 1,
    "url": "https://www.nike.com/t/air-max-97-se-mens-shoes-3l919x/DN1893-001",
    "style_index": 1,
    "size": "9.5",
    "shipping_speed_index": 0,
    "billing_address_id": 2,
    "shipping_address_id": 1,
    "notification_email_address": "[email protected]"
}

@samc621
Copy link
Owner

samc621 commented Apr 16, 2022

@bklynate yes it's the same issue. If you look in the Network tab, you should see some 429 errors.

@bklynate
Copy link
Contributor

RANDOM OBSERVATION
I've noticed even with PARALLEL_TASKS=1 Chrome opens two windows when beginning a task, why is that? And could that be the source of the issue?

@samc621
Copy link
Owner

samc621 commented Apr 16, 2022

@bklynate I'm pretty sure that is because of how puppeteer-cluster works. It always has an extra page/browser (depending on the concurrency context you choose). It's similar to how puppeteer opens a blank page when it starts up.

I think the issue is more likely to be related to the browser fingerprint. There's a lot of things that antibot softwares use to detect bots, but one of the easiest ones I've seen is the use of Chromium or another developer-friendly browser. The fix might be as simple as switching it out with Firefox. See more of my suggestions above.

@bklynate
Copy link
Contributor

I've tried changing browsers (Firefox) and forcing the use of Chrome instead of Chromium, but none of that has worked thus far.

@ethanlaj
Copy link

Is this still an active problem?

@samc621
Copy link
Owner

samc621 commented May 30, 2022

Is this still an active problem?

@ethanlaj Yes I believe so, I haven't seen a PR to fix it. If you get around to it, feel free to open one and I'll happily review it.

@elManto
Copy link

elManto commented Jul 5, 2022

Question have you experienced this issue with ALL websites or just with the nike website? At least I know where to start my investigation to understand which component performs the actual fingerprinting job

@samc621
Copy link
Owner

samc621 commented Jul 5, 2022

@elManto only on Nike.

@elManto
Copy link

elManto commented Jul 7, 2022

Update, at least if someone is working on this he doesn't have to re-invent the wheel.

  1. I use this site https://amiunique.org/fp to compare the fingerprint of my browser with one managed by puppeteer both headless and with GUI. In the GUI case I'd say the only difference is probably a plugin missing, I think that is not enough to block our requests
  2. I had a quick look at the nike page, this library looks like interesting: https://github.com/bluesmoon/boomerang . It does user profiling, officially for data monitoring but it may be used also to block the bots. For now I didn't reverse it, I'm not a web guy and it will take some time, I prefer excluding other roads before.
  3. What is interesting is that if I start a naive puppeteer request (headless == false) to a product URL on the nike website from a separated script, without any type of customization/user agent randomization etc., it lets me connect without blocking anything, maybe one of the flags that @samc621 enabled when running chrome is a bad one, I'll investigate in the next days

@aditya-pushkar
Copy link

aditya-pushkar commented Jul 22, 2022

@samc621 I managed to get around this problem with puppeteer, but the solution was not scalable, so I decided to try ( Playwright ), and it worked. Here is the some basic code I wrote.

const { webkit } = require('playwright');

(async () =>  {

    const browser = await webkit.launch({headless: false});
    const page = await browser.newPage();

    const url = "https://www.nike.com/in/t/air-force-1-07-lv8-shoes-V6SkWv/DR9866-100"
    await page.goto(url, {
        waitUntil: 'networkidle'
    });


    console.log("Selecting the size !")
    await page.locator('text=UK 9').first().click();

    console.log("Clicking on Add to cart !")
    await page.locator('text=Add to Bag').click();

    await page.waitForTimeout(3000)
    page.goto("https://www.nike.com/in/cart")

})();

However, Playwright has some major issues with Nike.com.

Chromium browser is not working on Nike.com, so i have to use FireFox and Safari for testing.

According to the documentation of "puppeteer-extra", we can use "puppeteer-extra-plugin-stealth" with "playewright-extra". here is the doc.
But the moment I used the "playwright-extra" plugin, the program started throwing some errors.

 typeError: Cannot read properties of undefined (reading 'userAgent')
 at Proxy.<anonymous> (/Users/{username}/Desktop/playground/bot/node_modules/playwright-extra/dist/index.cjs.js:270:33)
 at async Plugin.onPageCreated (/Users/{username}/Desktop/playground/bot/node_modules/puppeteer-extra-plugin-stealth/evasions/user-agent-override/index.js:69:8)

But the main point is that Nike.com started throwing 429 errors again after using extra plugins. Here is the code.

const { firefox } = require('playwright-extra');

const stealth = require('puppeteer-extra-plugin-stealth')();
firefox.use(stealth);

const UserAgent = require('user-agents');
const userAgent = new UserAgent();

(async () =>  {
    const browser = await firefox.launch({
        headless: false
    });
    const context = await browser.newContext({
        userAgent: `${userAgent.userAgent}`
    })
    const page = await context.newPage();

    const url = "https://www.nike.com/in/t/air-force-1-07-lv8-shoes-V6SkWv/DR9866-100"
    await page.goto(url, {
        waitUntil: 'networkidle'
    });

    await page.waitForTimeout(5000)

    console.log("Selecting the size !")
    await page.locator('text=UK 9').first().click();

    console.log("Clicking on Add to cart !")
    await page.locator('text=Add to Bag').click();

    await page.waitForTimeout(3000)
    page.goto("https://www.nike.com/in/cart")
})();

Here is my thought.

Nike is somehow able to detect the "Puppeteer" and "puppeteer-extra" plugins.

For Nike.com, "Playwright" without any plugins should be sufficient, but we must modify the User Agent and Fingerprints if we want to scale the bot.

  • I tested this bot on Nike India store.

Article about tracking of Puppeteer.

Open for feedback.

@samc621
Copy link
Owner

samc621 commented Jul 23, 2022

@elManto @aditya-pushkar thank you both for your work here, and sorry for the delay in the response as I've gotten very busy. It seems to me that there are many potential solutions so I think the best step forward is to identify the constraints and then agree on the best solution within those constraints.

  1. I have recently seen more websites that are blocking just on the basis of detecting Chromium. I'm 100% fine with switching this out with another browser. Playwright can do this trivially, but Puppeteer can do this too. You can launch Puppeteer with a custom executablePath to Chrome or even another browser. I also think you can specify a product with one of chrome or firefox.
  2. It has also come to my attention that some of the launch args I added for Docker "headful" support (e.g. --no-sandbox) might be interfering here. I'm fine with removing them as long as the support remains. The goal here was the ability to run the headful mode on a server and then access it via VNC client.
  3. In theory, this bot should work irregardless of whether we are using headless true/false. Running in "headful" mode should be an option, not a requirement.
  4. I'd also like to continue to use puppeteer-cluster if possible. AFAIK, there isn't anything Playwright can do that Puppeteer can't.

We are already randomizing the UA but we can also look into loading a custom profile with the userDataDir argument.

I'd be very surprised if the above isn't enough to get unblocked. I'll need to find time for a closer look if it does.

@aditya-pushkar
Copy link

@samc621 Just an update I tried puppeteer Chrome with a custom executablePath and saved the browser session with userDataDir, but the bot is still detectable.

The bot is not able to pass the detection test on Creepjs.

I think we have to modify the fingerprint.

@samc621
Copy link
Owner

samc621 commented Jul 25, 2022

@samc621 Just an update I tried puppeteer Chrome with a custom executablePath and saved the browser session with userDataDir, but the bot is still detectable.

The bot is not able to pass the detection test on Creepjs.

I think we have to modify the fingerprint.

@aditya-pushkar what version of Chrome did you use? Did you try removing any of the launch args? And did you launch in headless or headful? Some info will help me verify from my end.

@aditya-pushkar
Copy link

@samc621

  • Version: 103.0.5060.134
  • --user-agent arg is used
  • Tested on Headful

@aditya-pushkar
Copy link

@samc621

Can you help me with this? I have a use case in which multiple users can request different tasks at the same time, and for each task, a new puppeteer instance should start running immediately without getting queued.

However, the single-threaded nature of Node is the issue. Whenever we want to run a CPU-intensive task, it processes a single request at a time, and other tasks get queued.

For example, I set up an express server with a puppeteer and when I send more than one request/task at a time, the task gets queued until the previous request is completed. Is there a way around this?

Is there something I'm missing?

Or can this problem be solved by serverless?

If there is any good resource you can point me to, It will be very helpful.

@samc621
Copy link
Owner

samc621 commented Aug 3, 2022

@aditya-pushkar not sure I understand your issue. I also think this might belong on a separate issue, but I'll try to help anyways.

So is your issue starting multiple tasks from the API in parallel? Is it a Puppeteer issue or a Node issue?

On the Puppeteer side, this shouldn't be a problem as long as you set an appropriate maxConcurrency for Puppeteer Cluster (you can use the PARALLEL_TASKS env var for this). Keep in mind the resource constraints of your machine.

On the Node side, I don't see what Node/Express has to do with this. They're async requests so they will not block the thread thanks to the Node.js event loop. Your Express server should be able to handle 1000 concurrent requests, or more, without issue. So I'm not sure what's causing the blocking from your end.

I might be missing some context. Can you explain your scenario (how you are testing/implementing this) in more detail?

@aditya-pushkar
Copy link

@samc621 Thank you very much.

@jorishaenseler15
Copy link

Are there any update on this @samc621 Thanks for help

@samc621
Copy link
Owner

samc621 commented Jan 12, 2023

@jorishaenseler15 I haven't gotten around to properly debugging the issue, just haven't found the time. Please feel free to submit a PR if you do.

@Kaherdin
Copy link

Still no update ?

@samc621
Copy link
Owner

samc621 commented Dec 14, 2023

Still no update ?

HI @Kaherdin I haven't been actively maintaining this project for a little while now. Please feel free to submit a PR if you are able to contribute a fix. Thanks!

@joacoarana
Copy link

Nothing yet? I have been getting the same error with a basic selenium script...

@samc621
Copy link
Owner

samc621 commented Mar 10, 2024

Nothing yet? I have been getting the same error with a basic selenium script...

Hi @joacoarana I haven't been actively maintaining this project for a little while now. Please feel free to submit a PR if you are able to contribute a fix. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests