Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periods of lag observed on 7th November 2024 #12

Open
godfrey-altmetric opened this issue Nov 8, 2024 · 2 comments
Open

Periods of lag observed on 7th November 2024 #12

godfrey-altmetric opened this issue Nov 8, 2024 · 2 comments

Comments

@godfrey-altmetric
Copy link

godfrey-altmetric commented Nov 8, 2024

Hi @mackuba.

We've been running a connection to the bsky.network firehose with this gem for a few weeks now, and our consumer manages to keep up with the firehose events in near real-time. As of yesterday, 7th Nov 2024, we noticed at least 2 periods where our consumer started lagging, falling behind with a peak delay of ~9600s/2.7hrs:

Screenshot 2024-11-08 at 14 23 05

So far today we have not observed any delay, so this perhaps may be a one time event, as Bluesky have noted increased activity in the last few days since the election. However, we've been advised by Bluesky that they did not detect any significant delays in the relays or the firehose on their side.

Is this something you noticed, or other users of the gem, experienced also?

Update: Are there perhaps any external API dependencies (besides the main firehose) that are called within the gem that could help explain this?

@mackuba
Copy link
Owner

mackuba commented Nov 23, 2024

Hey, sorry I forgot to reply…

I did actually have some issues on that day, and someone else did too, see thread here: https://bsky.app/profile/mackuba.eu/post/3laeiuzz2fw26 - but I don't know if they also use this library (although their avatar says "Ruby" :)

Generally there have been tons of issues in the past couple of weeks, but I think everyone had those, since the relay as a whole was crashing… I'm hoping that things will get better now. I generally didn't have much disconnection issues with this setup until the Brazil wave in September, and then only on days with very high traffic (though now that means all days…).

Two things that could help:

  1. There's an optional "heartbeat" feature that I've been testing for months on a branch and was released in 0.4, which monitors if there haven't been any new events in some time and forces a reconnect then. For me this generally solves or at least counteracts most issues that normal auto-reconnect doesn't handle, except whatever has been happening with the relay this week, but hopefully that's fixed now. This needs to be enabled using a check_heartbeat flag, see latest docs here: https://github.com/mackuba/skyfall?tab=readme-ov-file#reconnection-logic. I think this would have reconnected in this case, since if I read this chart correctly, you just weren't receiving anything since ~14:20.

  2. Switching to a Jetstream source, support added in 0.5 - Jetstream uses less bandwidth and I think it's generally been more stable.

I don't know exactly what is the source of this issue where the events just stop coming but the socket doesn't disconnect, if it's something in the server's implementation or one of the client libraries I'm using, or some incompatibility between the two, or if it's just something that websockets do sometimes… but it's been happening occasionally, which is why I added this heartbeat thing. But during the spring-summer it was happening so rarely that it was hard to test this, because it was sometimes never triggered for a month or more. It only started happening more with the increase in traffic in the autumn.

@godfrey-altmetric
Copy link
Author

No worries @mackuba, thanks for the above info. It does correlate with what we observed, but since that day we've not seen anything of the same magnitude in terms of delays and disconnections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants