Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add internal demo saga #6281

Merged
merged 6 commits into from
Aug 12, 2024
Merged

add internal demo saga #6281

merged 6 commits into from
Aug 12, 2024

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Aug 10, 2024

This PR adds to Nexus "demo" saga, which is a one-node saga that sits and waits for a request (on the internal API) to tell it to finish. The point of this saga is to help with testing saga recovery and #6215. It's easiest to see with an omdb example.

First, I started omicron-dev run-all:

$ cargo run --bin=omicron-dev run-all
   Compiling omicron-nexus v0.1.0 (/home/dap/omicron-merge/nexus)
   Compiling omicron-dev v0.1.0 (/home/dap/omicron-merge/dev-tools/omicron-dev)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2m 56s
     Running `target/debug/omicron-dev run-all`
omicron-dev: setting up all services ... 
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.19549.0.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.19549.0.log"
DB URL: postgresql://root@[::1]:64409/omicron?sslmode=disable
DB address: [::1]:64409
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.19549.2.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.19549.2.log"
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.19549.3.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.19549.3.log"
omicron-dev: services are running.
omicron-dev: nexus external API:    127.0.0.1:12220
omicron-dev: nexus internal API:    [::1]:12221
omicron-dev: cockroachdb pid:       19558
omicron-dev: cockroachdb URL:       postgresql://root@[::1]:64409/omicron?sslmode=disable
omicron-dev: cockroachdb directory: /dangerzone/omicron_tmp/.tmpNUCfme
omicron-dev: internal DNS HTTP:     http://[::1]:45884
omicron-dev: internal DNS:          [::1]:51490
omicron-dev: external DNS name:     oxide-dev.test
omicron-dev: external DNS HTTP:     http://[::1]:49886
omicron-dev: external DNS:          [::1]:58543
omicron-dev:   e.g. `dig @::1 -p 58543 test-suite-silo.sys.oxide-dev.test`
omicron-dev: management gateway:    http://[::1]:59318 (switch0)
omicron-dev: management gateway:    http://[::1]:36934 (switch1)
omicron-dev: silo name:             test-suite-silo
omicron-dev: privileged user name:  test-privileged

Then I started a second copy of Nexus using the instructions, just so I could more easily control its execution:

$ cargo run --bin=nexus -- config-second.toml 
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.96s
     Running `target/debug/nexus config-second.toml`
Aug 10 00:03:57.594 INFO setting up nexus server, name: a4ef738a-1fb0-47b1-9da2-4919c7ec7c7f, file: nexus/src/lib.rs:86
...
Aug 10 00:03:57.772 INFO listening, local_addr: [::1]:12223, component: dropshot_internal, name: a4ef738a-1fb0-47b1-9da2-4919c7ec7c7f, file: /home/dap/.cargo/git/checkouts/dropshot-a4a923d29dccc492/52d900a/dropshot/src/server.rs:205
...

As part of this I also added omdb subcommands to list the sagas that a Nexus instance knows about. (This is not to be confused with a general command for listing all sagas. It only lists the sagas that this instance knows about in-memory, not anything that ran in a past life or in another Nexus.) Start by listing them:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas list
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.98s
     Running `target/debug/omdb -w nexus sagas list`
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID STATE 

Now we can create a demo saga:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas demo-create
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.98s
     Running `target/debug/omdb -w nexus sagas demo-create`
note: using Nexus URL http://[::1]:12223
saga id:      24e9169b-bfb4-469d-8c85-7b23feac2ceb
demo saga id: e797f237-53fd-4f6c-9e89-60d58044c112 (use this with `demo-complete`)

We can see it in the list of sagas now. It's running:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas list
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.96s
     Running `target/debug/omdb -w nexus sagas list`
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE   
24e9169b-bfb4-469d-8c85-7b23feac2ceb running 

and it will stay running indefinitely until we run demo-complete. Let's do that:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas demo-complete e797f237-53fd-4f6c-9e89-60d58044c112
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.99s
     Running `target/debug/omdb -w nexus sagas demo-complete e797f237-53fd-4f6c-9e89-60d58044c112`
note: using Nexus URL http://[::1]:12223

and then list sagas again:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas list
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.03s
     Running `target/debug/omdb -w nexus sagas list`
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE     
24e9169b-bfb4-469d-8c85-7b23feac2ceb succeeded 

It works across recovery, too. You can go through the same loop again, but this time kill Nexus and start it again:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas demo-create
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.94s
     Running `target/debug/omdb -w nexus sagas demo-create`
note: using Nexus URL http://[::1]:12223
saga id:      b1dbcfd3-f0ab-4a8f-9b6f-60b109449f7c
demo saga id: 56179d38-e047-4d47-b54a-3b8218c93992 (use this with `demo-complete`)
$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas list
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.98s
     Running `target/debug/omdb -w nexus sagas list`
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE     
24e9169b-bfb4-469d-8c85-7b23feac2ceb succeeded 
b1dbcfd3-f0ab-4a8f-9b6f-60b109449f7c running   

After restarting Nexus, we don't see the earlier saga because it was finished when Nexus started. But we see the one we created later because it was recovered:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas list
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.94s
     Running `target/debug/omdb -w nexus sagas list`
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE   
b1dbcfd3-f0ab-4a8f-9b6f-60b109449f7c running 

Side note: we can see it was recovered:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus background-tasks show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.02s
     Running `target/debug/omdb -w nexus background-tasks show`
note: using Nexus URL http://[::1]:12223
...
task: "saga_recovery"
  configured period: every 10m
  currently executing: no
  last completed activation: iter 1, triggered by a periodic timer firing
    started at 2024-08-10T00:07:53.867Z (54s ago) and ran for 86ms
    since Nexus started:
        sagas recovered:           1
        sagas recovery errors:     0
        sagas observed started:    0
        sagas inferred finished:   0
        missing from SEC:          0
        bad state in SEC:          0
    last pass:
        found sagas:   1 (in-progress, assigned to this Nexus)
        recovered:     1 (successfully)
        failed:        0
        skipped:       0 (already running)
        removed:       0 (newly finished)
    recently recovered sagas (1):
        TIME                 SAGA_ID                              
        2024-08-10T00:07:53Z b1dbcfd3-f0ab-4a8f-9b6f-60b109449f7c 
    no saga recovery failures
...

Now we can complete that saga:

$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas demo-complete 56179d38-e047-4d47-b54a-3b8218c93992
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.96s
     Running `target/debug/omdb -w nexus sagas demo-complete 56179d38-e047-4d47-b54a-3b8218c93992`
note: using Nexus URL http://[::1]:12223
$ OMDB_NEXUS_URL=http://[::1]:12223 cargo run --bin=omdb -- -w nexus sagas list
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.98s
     Running `target/debug/omdb -w nexus sagas list`
note: using Nexus URL http://[::1]:12223
NOTE: This command only reads in-memory state from the targeted Nexus instance.
Sagas may be missing if they were run by a different Nexus instance or if they
finished before this Nexus instance last started up.
SAGA_ID                              STATE     
b1dbcfd3-f0ab-4a8f-9b6f-60b109449f7c succeeded 

I considered trying to only include this in development or something like that, but I don't think there's any harm to having it present all the time and it could conceivably even be useful in production (during support).

@davepacheco davepacheco requested a review from hawkw August 10, 2024 00:10
Copy link
Contributor

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool! What do you think about copying the PR's description you wrote with examples into a doc? It'd be useful for people just getting started

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me!

I left a few comments on potential future additions to the omdb nexus sagas commands you've added, but I think it would probably be better to save those for a later branch, unless you really want to add them now?

Also, I left a comment on the assert!, but I will freely admit that it's probably not actually a big deal to have it there.

dev-tools/omdb/src/bin/omdb/nexus.rs Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Show resolved Hide resolved
dev-tools/omdb/src/bin/omdb/nexus.rs Show resolved Hide resolved
nexus/src/app/sagas/demo.rs Show resolved Hide resolved
nexus/src/app/sagas/demo.rs Outdated Show resolved Hide resolved
@davepacheco
Copy link
Collaborator Author

This is really cool! What do you think about copying the PR's description you wrote with examples into a doc? It'd be useful for people just getting started

Good idea. I went ahead and added this as docs/demo-saga.adoc. I'm not sure it's that discoverable but in the future I hope we can organize the docs into a more explicit "Omicron developer guide" book.

@davepacheco davepacheco enabled auto-merge (squash) August 12, 2024 20:39
@davepacheco davepacheco merged commit 914f5fd into main Aug 12, 2024
22 of 23 checks passed
@davepacheco davepacheco deleted the dap/demo-saga branch August 12, 2024 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants