feat(MR): [MR-649] Support incremental rollout of best-effort calls #3688

alin-at-dfinity · 2025-01-30T12:29:42Z

Replace the on-off flag with an enum of 4 progressive rollout stages:

Stage 0: Trap on related API calls; and reject best-effort requests when routing (status quo).
Stage 1: Silently ignore ic0_call_with_best_effort_response() calls, falling back to guaranteed response calls; and reject best-effort requests when routing.
Stage 2: On system subnets, silently ignore ic0_call_with_best_effort_response() calls, falling back to guaranteed response calls; and reject best-effort requests that should be routed to system subnets.
Stage 3: Fully enable API calls; always route best-effort requests.

Rollbacks (incremental or direct) are also supported, but it's probably a bad idea to roll back to stage 0, as any canisters already using best-effort calls would break. Badly.

Replace the on-off flag with an enum of 4 progressive rollout stages: * Stage 0: Trap on related API calls; and reject best-effort requests when routing (status quo). * Stage 1: Silently ignore `ic0_call_with_best_effort_response()` calls, falling back to guaranteed response calls; and reject best-effort requests when routing. * Stage 2: On system subnets, silently ignore `ic0_call_with_best_effort_response()` calls, falling back to guaranteed response calls; and reject best-effort requests that should be routed to system subnets. * Stage 3: Fully enable API calls; always route best-effort requests. Rollbacks (incremental or direct) are also supported, but it's probably a bad idea to roll back to stage 0, as any canisters already using best-effort calls would break. Badly.

stiegerc

LGTM at first glance, but I wasn't present when this was discussed.

rs/config/src/embedders.rs

Co-authored-by: stiegerc <[email protected]>

alin-at-dfinity · 2025-01-30T15:56:30Z

rs/messaging/src/routing/stream_builder.rs

+                        // Best-effort request to unsupported subnet. Always route subnet-local requests
+                        // for consistency with scheduler routing.
+                        //
+                        // TODO(MR-649): Drop this once best-effort calls are fully deployed.
+                        RequestOrResponse::Request(req)
+                            if dst_subnet_id != self.subnet_id


AFAICT this is the only potentially contentious decision.

I could have gone the other way (also prevent routing in the scheduler), but that would have required even more code and more tests. Or, we can go with Message Routing rejecting local best-effort calls and the scheduler routing them, but that seems confusing and potentially problematic.

crusso

LGTM

derlerd-dfinity

Thanks a lot. Left some initial comments.

derlerd-dfinity · 2025-01-30T16:16:36Z

rs/config/src/embedders.rs

+    /// Stage 1: Feature is disabled, `ic0_call_with_best_effort_response` API is a
+    /// no-op, silently falling back to a guaranteed response.
+    FallBackToGuaranteedResponse,
+


It would be nice to have an additional step where we could enable it for a selected list of subnets only as a first rollout step, maybe something like
ListedSubnetsOnly(Vec<SubnetId>) or something?

Nice to have, I agree. But is it actually necessary? It would add hundreds more lines of test code (and chances of breaking things).

I suppose separating out VerifiedApplication subnets from plain Application subnets would be a quick and dirty option (I'm hoping, although I don't know for sure, that OpenChat subnets are VerifiedApplication subnets).

All that being said, are we actually concerned about the small amount of reasonably well tested extra code breaking all application subnets?

derlerd-dfinity · 2025-01-30T16:20:29Z

rs/config/src/embedders.rs

@@ -124,7 +150,7 @@ impl FeatureFlags {
            write_barrier: FlagStatus::Disabled,
            wasm_native_stable_memory: FlagStatus::Enabled,
            wasm64: FlagStatus::Enabled,
-            best_effort_responses: FlagStatus::Disabled,
+            best_effort_responses: BestEffortResponsesFeature::FallBackToGuaranteedResponse,


Shouldn't we still keep it at DisabledByTrap here and then do the switch in behavior with an explicit PR?

I don't think it makes any meaningful difference. You still can't make best-effort calls with either of them. I was even considering dropping DisabledByTrap altogether, since we're obviously never going back to it.

derlerd-dfinity · 2025-01-30T16:30:00Z

rs/system_api/src/lib.rs

@@ -3445,14 +3451,36 @@ impl SystemApi for SystemApiImpl {
                }),
                Some(request) => {
                    if request.is_timeout_set() {
-                        Err(HypervisorError::ToolchainContractViolation {
+                        return Err(HypervisorError::ToolchainContractViolation {


If we return here we don't get to the trace_syscall below. So I think it needs to remain an if / else.

It does, thanks for the catch.

alin-at-dfinity requested review from stiegerc, derlerd-dfinity and oggy-dfin January 30, 2025 12:29

alin-at-dfinity requested review from a team as code owners January 30, 2025 12:29

github-actions bot added the feat label Jan 30, 2025

Fix test compilation. Make clippy happy.

01c6d79

stiegerc reviewed Jan 30, 2025

View reviewed changes

rs/config/src/embedders.rs Outdated Show resolved Hide resolved

github-actions bot added @execution @consensus @languages labels Jan 30, 2025

Apply review suggestion.

7e53dd8

Co-authored-by: stiegerc <[email protected]>

adambratschikaye approved these changes Jan 30, 2025

View reviewed changes

alin-at-dfinity commented Jan 30, 2025

View reviewed changes

eichhorl approved these changes Jan 30, 2025

View reviewed changes

crusso approved these changes Jan 30, 2025

View reviewed changes

derlerd-dfinity reviewed Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(MR): [MR-649] Support incremental rollout of best-effort calls #3688

feat(MR): [MR-649] Support incremental rollout of best-effort calls #3688

alin-at-dfinity commented Jan 30, 2025

stiegerc left a comment

alin-at-dfinity Jan 30, 2025

crusso left a comment

derlerd-dfinity left a comment

derlerd-dfinity Jan 30, 2025

alin-at-dfinity Jan 31, 2025

derlerd-dfinity Jan 30, 2025

alin-at-dfinity Jan 31, 2025

derlerd-dfinity Jan 30, 2025

alin-at-dfinity Jan 31, 2025

feat(MR): [MR-649] Support incremental rollout of best-effort calls #3688

Are you sure you want to change the base?

feat(MR): [MR-649] Support incremental rollout of best-effort calls #3688

Conversation

alin-at-dfinity commented Jan 30, 2025

stiegerc left a comment

Choose a reason for hiding this comment

alin-at-dfinity Jan 30, 2025

Choose a reason for hiding this comment

crusso left a comment

Choose a reason for hiding this comment

derlerd-dfinity left a comment

Choose a reason for hiding this comment

derlerd-dfinity Jan 30, 2025

Choose a reason for hiding this comment

alin-at-dfinity Jan 31, 2025

Choose a reason for hiding this comment

derlerd-dfinity Jan 30, 2025

Choose a reason for hiding this comment

alin-at-dfinity Jan 31, 2025

Choose a reason for hiding this comment

derlerd-dfinity Jan 30, 2025

Choose a reason for hiding this comment

alin-at-dfinity Jan 31, 2025

Choose a reason for hiding this comment