Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix auto-retry behavior with autocommit_before_ddl #141116

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Feb 11, 2025

When the autocommit_before_ddl setting was enabled, we were
sending a notice to the client without any buffering. This prevents
auto-retry logic for retriable errors from kicking in, since if results
have already sent to the client, it's not safe to retry the current
statement.

The fix is to buffer the notice for sending instead. It will get sent
whenever results are flushed back to the client. In order to achieve
this, I introduced a new BufferClientNoticeInConnection function. The
existing BufferClientNotice function is not sufficient since that only
buffers notices in the command result, and that buffer is discarded due
to how the connExecutor state transitions are defined. Namely, the
connExecutor will execute the schema change command twice: once to
autocommit the current transaction, and another time to run the schema
change. To avoid losing the notice from the autocommit, it must be
buffered all the way into the client connection, and not just the
command result.

Running with this patch makes logic tests less flaky, since schema
changes that encounter retryable errors are way more likely to be able
to be automatically retried now. This allows us to remove the testing
knob that overrode the transaction liveness threshold. I verified the
flakiness is gone by using:

./dev testlogic ccl --config=3node-tenant --stress --ignore-cache

informs #133180
Release note (bug fix): Fixed a bug that prevented transaction retry
errors encountered during implicit transactions from being automatically
retried internally if the autocommit_before_ddl session variable was
enabled and the statement was a schema change.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss rafiss force-pushed the auto-retry-autocommit-before-ddl branch 2 times, most recently from 88f705e to 48ddae9 Compare February 11, 2025 20:03
@rafiss rafiss marked this pull request as ready for review February 11, 2025 20:30
@rafiss rafiss requested review from a team as code owners February 11, 2025 20:30
Copy link
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Subtle bug, nice find!

Reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)

@rafiss rafiss added the backport-25.1.x Flags PRs that need to be backported to 25.1 label Feb 11, 2025
When the autocommit_before_ddl setting was enabled, we were
sending a notice to the client without any buffering. This prevents
auto-retry logic for retriable errors from kicking in, since if results
have already sent to the client, it's not safe to retry the current
statement.

The fix is to buffer the notice for sending instead. It will get sent
whenever results are flushed back to the client. In order to achieve
this, I introduced a new BufferClientNoticeInConnection function. The
existing BufferClientNotice function is not sufficient since that only
buffers notices in the command result, and that buffer is discarded due
to how the connExecutor state transitions are defined. Namely, the
connExecutor will execute the schema change command twice: once to
autocommit the current transaction, and another time to run the schema
change. To avoid losing the notice from the autocommit, it must be
buffered all the way into the client connection, and not just the
command result.

Running with this patch makes logic tests less flaky, since schema
changes that encounter retryable errors are way more likely to be able
to be automatically retried now. This allows us to remove the testing
knob that overrode the transaction liveness threshold. I verified the
flakiness is gone by using:
```
./dev testlogic ccl --config=3node-tenant --stress --ignore-cache
```

Release note (bug fix): Fixed a bug that prevented transaction retry
errors encountered during implicit transactions from being automatically
retried internally if the autocommit_before_ddl session variable was
enabled and the statement was a schema change.
@rafiss rafiss force-pushed the auto-retry-autocommit-before-ddl branch from 48ddae9 to 2b9d377 Compare February 11, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-25.1.x Flags PRs that need to be backported to 25.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants