Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crosscluster/logical: enable automatic bidi replication #138297

Merged
merged 5 commits into from
Jan 13, 2025

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Jan 6, 2025

This patch enables a user to create a bidirectional replication stream with one
command. When the user runs CREATE LOGICALLY REPLICATED TABLE tabB FROM tabA ON uriA WITH BIDIRECTIONAL ON uriB, the following will happen:

  • on cluster B, during job planning, construct the reverse stream cmd, while we
    still have access to original LDR cmd. This command will have the following
    form: CREATE LOGICAL REPLICATION STREAM FROM TABLE tabB ON 'uriB' INTO TABLE tabA WITH CURSOR = $1, PARENT = '{og job id}';
  • on cluster B, create table tabB and begin an offline scan, as normal
  • on cluster B, once the offline scan completes, but before tabB is published,
    set up the reverse stream from B to A at a cursor time after the initial scan
    completes but before the tabB has come online. This cursor time prevents data
    looping and ensures the replication of all future foreground writes.
  • on cluster B, steady state replication begins.

Epic: none

Release note (sql change): when a user runs CREATE LOGICALLY REPLICATED TABLE,
they must specify one of the following options:

  • UNIDIRECTIONAL: setups a unidirectional stream with fast initial scan
  • BIDIRECTIONAL ON {dest uri}: sets up a bidirectional stream from the original
    dest to the original source.

@msbutler msbutler self-assigned this Jan 6, 2025
Copy link

blathers-crl bot commented Jan 6, 2025

It looks like your PR touches SQL parser code but doesn't add or edit parser tests. Please make sure you add or edit parser tests if you edit the parser.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler force-pushed the butler-bidi-3 branch 3 times, most recently from 62474e0 to 84d69ee Compare January 8, 2025 21:17
msbutler added a commit to msbutler/cockroach that referenced this pull request Jan 8, 2025
…ine tables

This patch is required for cockroachdb#138297, where an LDR stream is created from a
source that is offline. The subsequent PR will add test coverage to this.

Epic: none

Release note: none
msbutler added a commit to msbutler/cockroach that referenced this pull request Jan 8, 2025
…ine tables

This patch is required for cockroachdb#138297, where an LDR stream is created from a
source that is offline. The subsequent PR will add test coverage to this.

Epic: none

Release note: none
@msbutler msbutler force-pushed the butler-bidi-3 branch 2 times, most recently from 3eee766 to a254d02 Compare January 8, 2025 23:37
msbutler added a commit to msbutler/cockroach that referenced this pull request Jan 8, 2025
…ine tables

This patch is required for cockroachdb#138297, where an LDR stream is created from a
source that is offline. The subsequent PR will add test coverage to this.

Epic: none

Release note: none
@msbutler msbutler changed the title begin crosscluster/logical: enable automatic bidi replication Jan 8, 2025
@msbutler
Copy link
Collaborator Author

msbutler commented Jan 8, 2025

First commit getting reviewed in #138680

@msbutler
Copy link
Collaborator Author

msbutler commented Jan 9, 2025

The stress race failure is likely caused by admission control. In the logs, i see a lot of:

W250108 23:57:31.213985 551 kv/kvclient/kvcoord/dist_sender.go:2772 ⋮ [T1,Vsystem,n1,ts-poll] 691  slow replica RPC: have been waiting 10.30s (0 attempts) for RPC Merge [/System/tsd/‹cr.node.rpc.method.heartbeattxn.recv›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.errored.sql-kv-response.bulk-normal-pri›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.kvadmission.flow_token_dispatch.coalesced_elastic›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.admitted.sql-kv-response.normal-pri›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.errored.sql-kv-response.normal-pri›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.admitted.sql-sql-response.normal-pri›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.elastic_cpu.acquired_nanos›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.tenant.consumption.request_units›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.rpc.method.linkexternalsstable.recv›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.rpc.method.refresh.recv›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-max›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-p99.999›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-p99.99›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-p99.9›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-p99›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-p90›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-p75›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-p50›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-avg›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.node.admission.wait_durations.kv.normal-pri-count›/‹10s›/2025-01-08T23:00:00Z/‹1›],... 3309 skipped ..., Merge [/System/tsd/‹cr.store.queue.gc.info.abortspanconsidered›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.store.storage.secondary-cache.write-back-failures›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.store.storage.iterator.category-crdb-unknown.block-load.latency-sum›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.store.storage.iterator.category-scan-regular.block-load.latency-sum›/‹10s›/2025-01-08T23:00:00Z/‹1›], Merge [/System/tsd/‹cr.store.admission.errored.kv-stores.user-high-pri›/‹10s›/2025-01-08T23:00:00Z/‹1›] to replica (n1,s1):1; resp: ‹(err: <nil>), *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, ... 3309 skipped ..., *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse, *kvpb.MergeResponse›

msbutler added a commit to msbutler/cockroach that referenced this pull request Jan 9, 2025
…ine tables

This patch is required for cockroachdb#138297, where an LDR stream is created from a
source that is offline. The subsequent PR will add test coverage to this.

Epic: none

Release note: none
@msbutler msbutler marked this pull request as ready for review January 9, 2025 18:58
@msbutler msbutler requested review from a team as code owners January 9, 2025 18:58
@msbutler msbutler requested review from dt and jeffswenson and removed request for a team January 9, 2025 18:58
msbutler added a commit to msbutler/cockroach that referenced this pull request Jan 10, 2025
…tables

This patch is required for cockroachdb#138297, where an LDR stream is created from a
source that is offline. The subsequent PR will add test coverage to this.

Epic: none

Release note: none
@@ -406,6 +406,26 @@ func (p *partitionedStreamClient) CreateForTables(
return spec, nil
}

func (p *partitionedStreamClient) SetupReverseStream(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like SetupReverseStream is a bit of an odd man out compared to the other methods on the client. The other methods could be conceivably implemented over something like GRPC. SetupReverseStream on the other hand is accepting a sql command and is basically conn.Exec with a traced name.

I think this would match the pattern better if it took a struct that described the desired stream and internally constructed the statement.

Alternatively, I think this should be named something that makes it clear what it really is: like ExecStatement or even expose the underlying pg connection object and run the statement on that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good callout. I'll refactor this to be ExecStatement.

I prefer that over your first suggestion as it's much easier to construct the reverse stream statement during the planning of the original ldr stmt, where we access to the OG parsed statement. Then, since this needs to be called during job execution, i merely need to persist the cmd instead of all the components of command.

jobID := p.ExecCfg().JobRegistry.MakeJobID()
var reverseStreamCmd string
if stmt.CreateTable && options.BidirectionalURI() != "" {
// TODO: validate URI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating the URI is a bit tricky. We really need the peer cluster to validate it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need the peer cluster to validate it.

Good point. I can plan to pass it as an extra arg to CreateForTables, so the og source can validate it. Sounds reasonable?

craig bot pushed a commit that referenced this pull request Jan 10, 2025
138765: rowexec: add simple benchmark for index backfills r=rafiss a=rafiss

This benchmark is not super realistic, since it runs in memory, but it still should be useful for tracking changes over time.

Epic: CRDB-42901
Release note: None

138786: sql: add gauge for object count  r=annrpom a=annrpom

### sql: remove redundant metrics
Epic: none

Release note (backwards-incompatible change): Several metrics are redundant
and have been removed. Below is a mapping from removed metrics to
an existing, identical metric:
```
[REMOVED] sql.schema_changer.running
----
jobs.schema_change.currently_running

[REMOVED] sql.schema_changer.successes
----
jobs.schema_change.resume_completed

[REMOVED] sql.schema_changer.retry_errors
----
jobs.schema_change.resume_retry_error

[REMOVED] sql.schema_changer.permanent_errors
----
jobs.schema_change.resume_failed

```

---

### sql: add gauge for object count
This patch introduces a metric, `sql.schema_changer.object_count`, that
keeps track of the count of descriptors in the cluster.

Epic: none
Fixes: #134740

Release note (ops change): This patch introduces a metric,
`sql.schema_changer.object_count`, that keeps track of the count of
objects in the cluster.

138823: sql/externalcatalog: allow ExtractInternalCatalog to include offline tables r=fqazi a=msbutler

This patch is required for #138297, where an LDR stream is created from a source that is offline. The subsequent PR will add test coverage to this.

Epic: none

Release note: none

Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Annie Pompa <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
This patch introduces two options to CREATE LOGICALLY REPLICATED TABLES:
- UNIDIRECTIONAL: to specify a unidectional ldr stream with a fast offline scan
- BIDIRECTIONAL ON {uri}: which customers will use to set up a bidirectional
  stream with one command. The uri provided by the customer connects to the
source of the reverse stream, aka the destination of the main stream.

The PARENT option will be used internally to set up the reverse stream.
Specifically, when the destination sets up the reverse stream, it will pass its
jobID as the parent for the reverse stream, which the reverse stream will use
to prevent creating duplicate reverse streams and to configure a stream that
replicates from an offline source. Customers should not directly use it.

Epic: none

Release note: none
This destination will use this api to set up a reverse stream from a cursor.

Epic: none

Release note: none
For mysterious resons, the resolver cannot fetch offline descriptors unless the
mutable flag is passed.

Epic: none

Release note: none
This patch allows a user to create an LDR stream that replicates from an
offline table.

Epic: none

Release note: none
This patch enables a user to create a bidirectional replication stream with one
command. When the user runs `CREATE LOGICALLY REPLICATED TABLE tabB FROM tabA
ON uriA  WITH BIDIRECTIONAL ON uriB`, the following will happen:
- on cluster B, during job planning, construct the reverse stream cmd, while we
	still have access to original LDR cmd. This command will have the following
form: `CREATE LOGICAL REPLICATION STREAM FROM TABLE tabB ON 'uriB' INTO TABLE
tabA WITH CURSOR = $1, PARENT = '{og job id}';`
- on cluster B, create table tabB and begin an offline scan, as normal
- on cluster B, once the offline scan completes, but before tabB is published,
  set up the reverse stream from B to A at a cursor time after the initial scan
completes but before the tabB has come online. This cursor time prevents data
looping and ensures the replication of all future foreground writes.
- on cluster B, steady state replication begins.

Epic: none

Release note (sql change): when a user runs CREATE LOGICALLY REPLICATED TABLE,
they must specify one of the following options:
- UNIDIRECTIONAL: setups a unidirectional stream with fast initial scan
- BIDIRECTIONAL ON {dest uri}: sets up a bidirectional stream from the original
  dest to the original source.
Copy link
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@msbutler
Copy link
Collaborator Author

TFTR!

bors r=jeffswenson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants