Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You need to knock this shit off. #2

Closed
dogweather opened this issue Aug 30, 2022 · 6 comments
Closed

You need to knock this shit off. #2

dogweather opened this issue Aug 30, 2022 · 6 comments

Comments

@dogweather
Copy link

This is likely breaking the law of my jurisdiction and yours. Can I send you an invoice for the rate-limiting fees I incurred because of you? What's your business address?

Screen Shot 2022-08-30 at 5 51 08 PM

Screen Shot 2022-08-30 at 5 52 18 PM

Screen Shot 2022-08-30 at 5 53 31 PM

@tb0hdan
Copy link
Owner

tb0hdan commented Sep 1, 2022

Hey @dogweather, there's no need to be this rude.

Idun is a simple, open-source crawler that collects Internet domains. While I'm putting in my best effort to have it work
as efficiently as possible, incidents do happen.

There are several ways you can prevent things like this from happening in the future:

  1. Ask your ISP to add HTTP/429 code
  2. Implement rate limiting on web server
  3. Add proper filtering on CloudFlare
  4. Add Idun to robots.txt (yes, that works)
  5. Last but not least - ban offending IP

Now, in order for me to fully understand your complaint, please attach here access log from your webserver. I'll work on a fix.

Thanks and take care.

@dogweather
Copy link
Author

dogweather commented Sep 1, 2022

Why the copy and paste reply? If you look at my screenshots you'll see that I already performed all these steps.

I'm not a user of your software. I'm a third party who's injured by it.

I have to assume you're negligently abusing websites around the world.

@tb0hdan
Copy link
Owner

tb0hdan commented Sep 1, 2022

Your assumptions are wrong. I haven't pasted a single character.

Again, regarding your complaint - I'm just trying to get to the bottom of it. Like I did here: tb0hdan/domains#15

Please try and be constructive. Add access.log from your web server.

@dogweather
Copy link
Author

dogweather commented Sep 1, 2022

It looks like this has been a problem for a while: https://webmasters.stackexchange.com/questions/132737/bot-calling-my-php-script-too-fast

I'm not running a technology that produces a legacy-style access.log. Those screenshots are a small portion of my Cloudflare WAF logs. You can see the multiple requests per second, the rate limiting, the timestamps, and the host and url that you were hitting.

@tb0hdan
Copy link
Owner

tb0hdan commented Sep 1, 2022

The link you've posted describes an issue that has been long resolved. I will, however, conduct further investigation and try to reproduce it. Meanwhile, I have added your website to no-crawl list.

@dogweather
Copy link
Author

I have added your website to no-crawl list.

Thank you.

@tb0hdan tb0hdan closed this as completed Sep 6, 2022
tb0hdan added a commit that referenced this issue Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants