Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in robustness test history patching #19303

Open
4 tasks
serathius opened this issue Jan 29, 2025 · 1 comment
Open
4 tasks

Bug in robustness test history patching #19303

serathius opened this issue Jan 29, 2025 · 1 comment

Comments

@serathius
Copy link
Member

Bug report criteria

What happened?

Test https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-etcd-robustness-release35-amd64/1883794260398968832 failed with panic panic: interface conversion: interface {} is nil, not model.EtcdRequest

Issue is reproducible locally from the report showing failed linearizaiton.

However, when I disabled history patching I got linearization success, implying that there is a bug in history patching.

What did you expect to happen?

Robustness test validation should not panic

How can we reproduce it (as minimally and precisely as possible)?

Follow instructions https://github.com/etcd-io/etcd/tree/main/tests/robustness#re-evaluate-existing-report on artifact from https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-etcd-robustness-release35-amd64/1883794260398968832

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

@joshuazh-x
Copy link
Contributor

The panic is caused by a patched porcupine.Operation whose invocation timestamp is behind its response timestamp. This breaks the causality assumption when building linearization visual.

The root cause comes from adjusting put return time when there are multiple wal entries having same put requests (same key and value). The put request for this specific case is Put("compact_rev_key", "1055"). When iterating persisted requests in reverse order, the last occurrence of such put request will adjust its return time using the earliest observed client return time which shall actually belong to its first occurrence. This may twist following calculation and make some request's return time too earlier to before its invocation time.

for i := len(persistedRequests) - 1; i >= 0; i-- {
request := persistedRequests[i]
switch request.Type {
case model.Txn:
lastReturnTime--
for _, op := range request.Txn.OperationsOnSuccess {
if op.Type != model.PutOperation {
continue
}
kv := keyValue{Key: op.Put.Key, Value: op.Put.Value}
returnTime, ok := earliestReturnTime[kv]
if ok {
lastReturnTime = min(returnTime, lastReturnTime)
earliestReturnTime[kv] = lastReturnTime
}
}
case model.LeaseGrant:
case model.LeaseRevoke:
case model.Compact:
default:
panic(fmt.Sprintf("Unknown request type: %q", request.Type))
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants