Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: network/host rate limiting #1413

Open
atomGit opened this issue Feb 18, 2025 · 7 comments
Open

feature: network/host rate limiting #1413

atomGit opened this issue Feb 18, 2025 · 7 comments

Comments

@atomGit
Copy link

atomGit commented Feb 18, 2025

liferea suffers from the same problem some other readers do; it updates feeds too quickly and this can cause various problems when a given host is hit with multiple requests in quick secession

bitchute.com is one such example where, if there are more than x number of requests in n seconds (and i don't know what x and n are) , then feed fetching is temp-blocked

i had the same problem in a script i wrote to check for broken hyperlinks in a website and got around it by first shuffling the array of urls, then by keeping a rolling list of urls with a time stamp for when they were queried and dropping the next url to be checked to the bottom of the array if the same domain was checked less than x seconds ago

@rich-coe
Copy link
Contributor

While working on bug#1376 I noticed that the code is aggressive about fetching an icon for the feed. It fires of a number of requests into the high-priority queue searching for an icon.
In pr#1398, I restructured the fetching of these icon requests.

This does not address the instance if you have more than one feed hitting the same server, it is possible for the same server to be accessed in a small time window. I noticed this with, I think, feedproxy.com, that number of sites used for servicing feeds. Occasionally, it would return a 4xx if the number requests exceeded a limit.

With or without pr#1398, it should be possible to defer a query to a server unless X seconds have elapsed since last contacted, for simple cases. It may be difficult to implement if the same server has one or more aliases that is serviced by a single network blocking device.

@atomGit
Copy link
Author

atomGit commented Feb 22, 2025

It may be difficult to implement if the same server has one or more aliases ...

the other thing that could be done in addition to what i proposed, is to add an option to limit the number of simultaneous requests - this isn't the perfect answer in the case of a server having aliases, but it's a help

@lwindolf
Copy link
Owner

@atomGit When bitchute.com blocks, does it send HTTP 429? If yes, support to react to HTTP 429 was added with previous release.

@atomGit
Copy link
Author

atomGit commented Feb 23, 2025

i'd have to create another test case because i don't recall if the response was 429, but, if that is the response, isn't it too late to do anything about it by that time?

how did your fix address that?

@lwindolf
Copy link
Owner

@atomGit I just implemented normal HTTP 429 handling. On the first HTTP 429 the client does no further requests on the domain until a given time. The time is either a default back-off interval 5min or an interval specified in the Retry-After header by the webserver.

By implementing this the server in the worst case get's only 1 unwanted request every 5 minutes.

@atomGit
Copy link
Author

atomGit commented Feb 23, 2025

ah, ok - that wouldn't for me since i fetch all feeds in one go once or twice a day - i keep auto-updating disabled

i think if you shuffled the urls to query and had an option to limit the number of simultaneous requests, that might work ... or ... if feeds are grouped by domain (alphabetical for ex.), then adding a configurable delay between multiple requests to the same domain might work ... or maybe some combination thereof

ps: openrss.org is another domain that might be very finicky about rapid, consecutive requests (openrss can gen feeds for some sites that don't offer them)

@lwindolf
Copy link
Owner

@atomGit I understand you thinking. But I do not see how to automatically choose the right rate limit. I also do not want to maintain a more complex network stack logic.

I'll think a bit on a maximum rate per domain logic a bit. Especially for background requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants