Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access node crashes with a panic log #6964

Open
vishalchangrani opened this issue Feb 1, 2025 · 1 comment
Open

Access node crashes with a panic log #6964

vishalchangrani opened this issue Feb 1, 2025 · 1 comment

Comments

@vishalchangrani
Copy link
Contributor

🐞 Bug Report

Node operator reported the following issue on the access node:

Panic: interface conversion: interface {} is , not string

goroutine 2377489 [running]:
github.com/onflow/flow-go/network/p2p/logging/internal.(*PeerIdCache).PeerIdString(0xc00009c710, {0xc141740900, 0x22})
        /app/network/p2p/logging/internal/peerIdCache.go:35 +0x105
github.com/onflow/flow-go/network/p2p/logging.PeerId(...)
        /app/network/p2p/logging/logging.go:26
github.com/onflow/flow-go/network/p2p/connection.(*ConnGater).InterceptPeerDial(0xc0009ae000, {0xc141740900, 0x22})
        /app/network/p2p/connection/connection_gater.go:75 +0x1d2
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).dialPeer(0xc000ef6488, {0x307dcf8, 0xc0bdfd73b0}, {0xc141740900, 0x22})
        /go/pkg/mod/github.com/libp2p/[email protected]/p2p/net/swarm/swarm_dial.go:256 +0x233
github.com/libp2p/go-libp2p/p2p/net/swarm.(*Swarm).DialPeer(0x0?, {0x307dcf8?, 0xc0bdfd73b0?}, {0xc141740900?, 0xc00182b5ff?})
        /go/pkg/mod/github.com/libp2p/[email protected]/p2p/net/swarm/swarm_dial.go:228 +0x25
github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).dialPeer(0xc000994600, {0x307dcf8, 0xc0bdfd73b0}, {0xc141740900, 0x22})
        /go/pkg/mod/github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:740 +0x12c
github.com/libp2p/go-libp2p/p2p/host/basic.(*BasicHost).Connect(0xc000994600, {0x307dcf8, 0xc0bdfd73b0}, {{0xc141740900, 0x22}, {0x0, 0x0, 0x0}})
        /go/pkg/mod/github.com/libp2p/[email protected]/p2p/host/basic/basic_host.go:733 +0x105
github.com/libp2p/go-libp2p/p2p/discovery/backoff.(*BackoffConnector).Connect.func1({{0xc141740900, 0x22}, {0x0, 0x0, 0x0}})
        /go/pkg/mod/github.com/libp2p/[email protected]/p2p/discovery/backoff/backoffconnector.go:82 +0xcb
created by github.com/libp2p/go-libp2p/p2p/discovery/backoff.(*BackoffConnector).Connect in goroutine 2851
        /go/pkg/mod/github.com/libp2p/[email protected]/p2p/discovery/backoff/backoffconnector.go:78 +0x4d6

What is the severity of this bug?

Important:

Critical - Urgent: We can't do anything if this isn't actioned immediately (product doesn't function without this, it's blocking us or users, or it resolves a high severity security issue). Whole team should drop what they're doing and work on this.

Critical: We can't do anything if this isn't actioned immediately (product doesn't function without this, it's blocking us or users, or it resolves a high severity security issue). One person should look at this right now.

Important: * We have to do this before we ship, but it can wait until the next sprint (product or feature won't function without it, but it's not blocking us or users right now). Team should do this in the next sprint.

Should have: * It would be better if we do this before we ship, but it's OK if we don't (product functions without this, but it's a better user experience). Consider adding to a future sprint.

Could have: It really doesn't matter if we do this (product functions without this, impact to user is minimal).

Reproduction steps

Expected behaviour

Specifications

current image version gcr.io/flow-container-registry/access:v0.37.26-rc.3

Additional context

@AlexHentschel
Copy link
Member

The PeerIdCache, which is panicking here (line 35) is an optimization that we have added to the networking layer. Its very very odd that line 35 panics, because
the cache lives in-memory only. We can rule out anything related to an old data model that might be stored in the data base or so. The only relevant problem could be a mismatching type during the runtime.
the only place where we ever add something to the underlying cache is lines 38-39, which guarantees that only strings are ever added to the cache.

also took at look at the PeerIdCache test and extended the test a bit, by adding the following sub-tests

t.Run("attempting to re-put already cached peer ID", func(t *testing.T) {
  pid := unittest.PeerIdFixture(t)
  pidString := pid.String()
  pidStr := cache.PeerIdString(pid) // on first call, we cache
  assert.NotEmpty(t, pidStr)
  assert.Equal(t, pidString, pidStr)

  s := strings.Clone(string(pid))
  var pidClone peer.ID = peer.ID(s)
  pidStr = cache.PeerIdString(pidClone) // second call attempting to put an already cached ID
  assert.NotEmpty(t, pidStr)
  assert.Equal(t, pid.String(), pidStr)

  gotPidStr, ok := cache.ByPeerId(pid)
  assert.True(t, ok, "expected pid to be in the cache")
  assert.Equal(t, pid.String(), gotPidStr)
})

t.Run("caching empty peer ID and attempting to re-put already cached peer ID", func(t *testing.T) {
  pid := peer.ID("")
  pidString := pid.String()
  pidStr := cache.PeerIdString(pid) // on first call, we cache
  assert.Equal(t, pidString, pidStr)

  s := strings.Clone(string(pid))
  var pidClone peer.ID = peer.ID(s)
  pidStr = cache.PeerIdString(pidClone) // second call attempting to put an already cached ID
  assert.Equal(t, pid.String(), pidStr)

  gotPidStr, ok := cache.ByPeerId(pid)
  assert.True(t, ok, "expected pid to be in the cache")
  assert.Equal(t, pid.String(), gotPidStr)
})

they all succeed. Given the simplicity of the code together with the testing and analysis, I really have a hard time understanding how this crash could be possible. I am wondering if maybe their container is corrupted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants