Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication tablet picker: vttablet panics if a source shard doesn't have a primary #17571

Closed
rohit-nayak-ps opened this issue Jan 18, 2025 · 1 comment

Comments

@rohit-nayak-ps
Copy link
Contributor

Overview of the Issue

When a VReplication is in streaming binlogs, and a target primary is in the process of selecting a tablet to stream from, it can panic if a source shard is temporarily without a primary: due to a PRS/ERS or a tablet roll, for example.

Reproduction Steps

This new unit test reproduces this issue.

func TestPickNoPrimary(t *testing.T) {
	defer utils.EnsureNoLeaks(t)
	ctx, cancel := context.WithTimeout(context.Background(), 200*time.Millisecond)
	defer cancel()

	te := newPickerTestEnv(t, ctx, []string{"cell", "otherCell"})
	want := addTablet(ctx, te, 100, topodatapb.TabletType_PRIMARY, "cell", true, true)
	defer deleteTablet(t, te, want)
	ctx, cancel = context.WithTimeout(ctx, 200*time.Millisecond)
	defer cancel()
	_, err := te.topoServ.UpdateShardFields(ctx, te.keyspace, te.shard, func(si *topo.ShardInfo) error {
		si.PrimaryAlias = nil
		return nil
	})
	require.NoError(t, err)

	tp, err := NewTabletPicker(ctx, te.topoServ, []string{"otherCell"}, "cell", te.keyspace, te.shard, "primary", TabletPickerOptions{})
	require.NoError(t, err)

	ctx2, cancel2 := context.WithTimeout(ctx, 200*time.Millisecond)
	defer cancel2()
	_, err = tp.PickForStreaming(ctx2)
	require.Errorf(t, err, "No healthy serving tablet")
}
=== RUN   TestPickNoPrimary
I0118 13:07:00.547838   20753 locks.go:164] Locking keyspace ks for action CreateShard with options: {lockType:0 ttl:0}
I0118 13:07:00.548009   20753 locks.go:210] Unlocking keyspace ks for successful action CreateShard
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x28 pc=0x105780b4c]

goroutine 31 [running]:
vitess.io/vitess/go/vt/topo.(*Server).GetTablet(0x14000064730?, {0x106227e68, 0x140000646e0}, 0x0)
	/Users/rohit/vitess/go/vt/topo/tablet.go:177 +0x3c
vitess.io/vitess/go/vt/topo.(*Server).GetTabletMap.func1(0x0)
	/Users/rohit/vitess/go/vt/topo/tablet.go:506 +0x274
created by vitess.io/vitess/go/vt/topo.(*Server).GetTabletMap in goroutine 30
	/Users/rohit/vitess/go/vt/topo/tablet.go:493 +0x1bc
exit status 2
FAIL	vitess.io/vitess/go/vt/discovery	1.126s

Binary Version

main

Operating System and Environment details

.

Log Fragments

@dbussink
Copy link
Contributor

This has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants