send notifications synchronously #295

marten-seemann · 2021-12-09T04:45:23Z

notifyAll is creating a lot of allocations (and therefore pressure on the GC) by notifying async. This shows up in scenarios where we receive a lot of connections / streams at the same time.

-gcflags=-m confirms why notifyAll is pretty allocation-heavy:

./swarm.go:508:7: leaking param: s
./swarm.go:508:27: leaking param: notify
./swarm.go:514:11: leaking param: f
./swarm.go:509:6: moved to heap: wg
./swarm.go:514:6: func literal escapes to heap

Note that there's been some discussion about this a long time ago: #104 (comment). I'd argue though that if there's one misbehaving consumer, we have a problem no matter if we send the notifications sync or async, as notifyAll still waits for all notifications to be processed.
The real question therefore is if the cost of spawning the go routines is less than what we gain from handling these notifications handled concurrently.

Thanks to @mvdan for his help debugging this.

vyzo

the other issue is that this is serial as opposed to parallel; not sure it is what we want.

In practice, all notifiees spawn their own goroutine however, so we might be fine.

Still, i'd like @Stebalien 's opinion here.

mvdan · 2021-12-09T10:15:14Z

I'd argue though that if there's one misbehaving consumer, we have a problem no matter if we send the notifications sync or async, as notifyAll still waits for all notifications to be processed.

I agree with that reasoning.

If we are worried about misbehaving consumers, perhaps we should add a context.Context and use a timeout per notification callback. It would still be possible for a callback to ignore the context and hang forever, but hopefully it would be a harder mistake to make.

Thanks to @mvdan for his help debugging this.

For some numerical context on why this is very important; when a libp2p node deals with large amounts of new streams, this particular bit of code accounts for about a third of all allocations, which adds a significant amount of pressure on the runtime:

         .          .    509:func (s *Swarm) notifyAll(notify func(network.Notifiee)) {
     98305      98305    510:   var wg sync.WaitGroup
         .          .    511:
         .      10923    512:   s.notifs.RLock()
         .          .    513:   wg.Add(len(s.notifs.m))
         .          .    514:   for f := range s.notifs.m {
   5051867    5051867    515:           go func(f network.Notifiee) {
         .          .    516:                   defer wg.Done()
         .          .    517:                   notify(f)
         .          .    518:           }(f)
         .          .    519:   }
         .          .    520:
         .      27309    521:   wg.Wait()
         .          .    522:   s.notifs.RUnlock()

The above is from a go tool pprof -alloc_objects spanning 10s. That goroutine allocates ~5M objects, making it the number 1 cause of allocs by a large difference. The next entries in the top 10 allocate in the order of tens to hundreds of thousands of objects, orders of magnitude below.

vyzo · 2021-12-09T10:35:02Z

These arguments are quite convincing, and i agree pretty much.

Still, this is something steven might feel very strongly about, so lets wait on him.

mvdan · 2021-12-09T10:55:49Z

For sure, happy to wait for the original authors of the code :) Just wanted to add a bit more context.

Stebalien

You're absolutely correct that we should fix the slow services. Unfortunately, this isn't one of those "change it and see what breaks" cases.

We already know what'll break. The issue here is bitswap. If we make this change, other services will have their disconnect/connect notifications massively delayed by bitswap on the gateways, preload nodes, etc.

Now, I don't actually know if that's a huge issue. But I just don't know.

If we're going to make this change, we need to either:

Fix bitswap. That should be significantly easier now that we actually have true peer "disconnect" and "connect" events. Unfortunately, bitswap is a bit of a mess.
Test this change on our infra to see what breaks.

send notifications synchronously

920efa6

marten-seemann requested review from vyzo and Stebalien December 9, 2021 04:45

vyzo reviewed Dec 9, 2021

View reviewed changes

vyzo approved these changes Dec 9, 2021

View reviewed changes

Stebalien suggested changes Dec 10, 2021

View reviewed changes

marten-seemann mentioned this pull request Dec 14, 2021

go-libp2p v0.18.0 libp2p/go-libp2p#1267

Closed

69 tasks

guseggert mentioned this pull request Dec 14, 2021

Receive swarm notifications synchronously ipfs/go-bitswap#547

Closed

marten-seemann mentioned this pull request Jan 18, 2022

go-libp2p v0.19.0 libp2p/go-libp2p#1298

Closed

38 tasks

marten-seemann mentioned this pull request Apr 2, 2022

go-libp2p v0.20.0 libp2p/go-libp2p#1371

Closed

65 tasks

This was referenced May 20, 2022

go-libp2p v0.21.0 libp2p/go-libp2p#1514

Closed

identify: Fix flaky tests libp2p/go-libp2p#1555

Merged

swarm: send notifications synchronously libp2p/go-libp2p#1562

Merged

Wondertan mentioned this pull request Jan 28, 2023

fix(basichost): Emit Connected events asynchronously libp2p/go-libp2p#2027

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

send notifications synchronously #295

send notifications synchronously #295

marten-seemann commented Dec 9, 2021 •

edited

Loading

vyzo left a comment

mvdan commented Dec 9, 2021

vyzo commented Dec 9, 2021

mvdan commented Dec 9, 2021

Stebalien left a comment

send notifications synchronously #295

Are you sure you want to change the base?

send notifications synchronously #295

Conversation

marten-seemann commented Dec 9, 2021 • edited Loading

vyzo left a comment

Choose a reason for hiding this comment

mvdan commented Dec 9, 2021

vyzo commented Dec 9, 2021

mvdan commented Dec 9, 2021

Stebalien left a comment

Choose a reason for hiding this comment

marten-seemann commented Dec 9, 2021 •

edited

Loading