Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs best effort responses #3865

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
f61fe5a
WIP: adjust the docs to best-effort responses
oggy-dfin Nov 29, 2024
5f89f7e
Adjust the safe retries doc
oggy-dfin Dec 3, 2024
44806de
Adjust security best practices
oggy-dfin Dec 5, 2024
410e042
Improve the terminology in the message execution properties doc
oggy-dfin Dec 6, 2024
2916a32
Revamp the overview doc for inter-canister calls
oggy-dfin Dec 6, 2024
0613d32
Fix some typos
oggy-dfin Dec 6, 2024
d344cd9
Some more improvements
oggy-dfin Dec 10, 2024
ddf5818
Apply suggestions from code review
oggy-dfin Dec 11, 2024
e213314
Address David's and Jessie's comments
oggy-dfin Dec 12, 2024
7d82156
Update docs/references/message-execution-properties.mdx
oggy-dfin Dec 12, 2024
d545ab8
Update docs/references/message-execution-properties.mdx
oggy-dfin Dec 12, 2024
0132c93
Update docs/references/message-execution-properties.mdx
oggy-dfin Dec 12, 2024
9eb0ce2
Update docs/references/message-execution-properties.mdx
oggy-dfin Dec 12, 2024
7f4f09b
Update docs/references/message-execution-properties.mdx
oggy-dfin Dec 12, 2024
d626bb4
Address Alin's comments (WIP)
oggy-dfin Dec 12, 2024
7da5a43
Address Alin's comments
oggy-dfin Dec 12, 2024
dacf8bc
Minor improvements
oggy-dfin Dec 12, 2024
7a4f424
Update docs/developer-docs/smart-contracts/advanced-features/async-co…
oggy-dfin Dec 13, 2024
2cd1f77
Andy's comments
oggy-dfin Dec 23, 2024
bf6cb13
Merge branch 'master' into oggy/best-effort-responses
oggy-dfin Dec 23, 2024
2ad10b6
Improve the messaging properties doc a bit
oggy-dfin Dec 23, 2024
f7f7e1a
Alin's comment on users vs applications
oggy-dfin Dec 23, 2024
6b0944d
Improve description of asynchronous calls
oggy-dfin Jan 28, 2025
922e0c2
Merge branch 'master' into oggy/best-effort-responses
oggy-dfin Jan 29, 2025
9f30199
Merge branch 'master' into oggy/best-effort-responses
oggy-dfin Jan 31, 2025
d6ef0c0
Revamp the docs on inter-canister calls in Rust
oggy-dfin Jan 31, 2025
2f1fc25
Polish the Rust inter-canister calls page a bit
oggy-dfin Jan 31, 2025
4425de4
Use more positive language on the call properties
oggy-dfin Feb 3, 2025
03b0f8f
fix CSP
jessiemongeon1 Feb 3, 2025
56f84fe
Fix typo
oggy-dfin Feb 4, 2025
856cf74
Panic instead of unreachable on ICP ledger errors
oggy-dfin Feb 4, 2025
349a221
Update docs/developer-docs/backend/rust/intercanister.mdx
oggy-dfin Feb 4, 2025
6971edd
Update docs/developer-docs/backend/rust/intercanister.mdx
oggy-dfin Feb 4, 2025
17c59d3
Update intercanister.mdx
oggy-dfin Feb 4, 2025
f18c6fb
Revamp the inter-canister messaging tutorial completely
oggy-dfin Feb 7, 2025
022c1e2
Add the cycles example
oggy-dfin Feb 10, 2025
0d7cd05
Touch up some text
oggy-dfin Feb 10, 2025
8a0f049
Start the tutorial with a very basic example.
oggy-dfin Feb 10, 2025
5e40227
Move to the new proposed CDK call error API and restructure again
oggy-dfin Feb 12, 2025
419649a
Fix the page (stop from crashing)
oggy-dfin Feb 12, 2025
d18f034
Add summaries in info boxes
oggy-dfin Feb 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 39 additions & 136 deletions docs/developer-docs/backend/rust/intercanister.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,169 +8,72 @@ import { MarkdownChipRow } from "/src/components/Chip/MarkdownChipRow";

<MarkdownChipRow labels={["Beginner", "Rust", "Tutorial"]} />

Just like users can call canisters, canisters can also call other canisters. This document shows how to use these inter-canister calls in Rust. To fully understand calls, their properties, and common pitfalls and security issues, refer to the section on [inter-canister calls](/docs/current/developer-docs/smart-contracts/advanced-features/async-code).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Just like users can call canisters, canisters can also call other canisters. This document shows how to use these inter-canister calls in Rust. To fully understand calls, their properties, and common pitfalls and security issues, refer to the section on [inter-canister calls](/docs/current/developer-docs/smart-contracts/advanced-features/async-code).
Just like users can call canisters, canisters can also call other canisters. This document shows how to use these inter-canister calls in Rust. To fully understand calls, their properties, common pitfalls, and for an overview of things that should be considered for security, refer to the section on [inter-canister calls](/docs/current/developer-docs/smart-contracts/advanced-features/async-code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "security considerations"; remove the double "and"

oggy-dfin marked this conversation as resolved.
Show resolved Hide resolved

Our examples will center around tokens. We will write a simple wallet canister that holds tokens on behalf of its owner, and allows the owner to transfer tokens. We'll first show an example of interacting with the ICP ledger, and then also any ledger that supports the ICRC-1 standard. Finally, we will allow the wallet to determine the exchange rate between supported tokens using the exchange rate canister.

Inter-canister calls can be used to update information between two or more canisters.
## Dependencies and imports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to skip dependencies and imports... maybe can instead point to the full code somewhere and say this is where you can get a working example and then only focus on the important snippets here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to fold them by default if possible, if not, we can push them to the end of the doc and link from the beginning.


To demonstrate these inter-canister calls, you'll use an example project called "PubSub".
We start by listing the dependencies used in this example as specified in `Cargo.toml`.

A common problem in both distributed and decentralized systems is keeping separate services (or canisters) synchronized with one another. While there are many potential solutions to this problem, a popular one is the **publisher/subscriber** pattern, or "PubSub". PubSub is an especially valuable pattern on ICP as its primary drawback, message delivery failures, does not apply.

## Prerequisites

Before getting started, assure you have set up your developer environment according to the instructions in the [developer environment guide](./dev-env.mdx).
```rust reference
https://github.com/oggy-dfin/icc_rust_docs/blob/34f59ddae9fcc70173fc21927a4279757f93c51a/src/icc_rust_docs_backend/Cargo.toml#L11-L16
```

Then, download the sample project's files with the commands:
Next, here are the imports used:

```bash
git clone https://github.com/dfinity/examples/
cd examples/rust/pub-sub/
```rust reference
https://github.com/oggy-dfin/icc_rust_docs/blob/34f59ddae9fcc70173fc21927a4279757f93c51a/src/icc_rust_docs_backend/src/lib.rs#L1-L8
```

## Viewing the canister code

This project is comprised of two canisters: publisher and subscriber.

The **subscriber** canister contains a record of topics. The **publisher** canister uses inter-canister calls to add topics to the record within the subscriber canister.

Let's take a look at the `src/lib.rs` file for each of these canisters.

```rust title="src/publisher/src/lib.rs"
use candid::{CandidType, Principal};
use ic_cdk::update;
use serde::Deserialize;
use std::cell::RefCell;
use std::collections::BTreeMap;

type SubscriberStore = BTreeMap<Principal, Subscriber>;

thread_local! {
    static SUBSCRIBERS: RefCell<SubscriberStore> = RefCell::default();
}

#[derive(Clone, Debug, CandidType, Deserialize)]
struct Counter {
    topic: String,
    value: u64,
}

#[derive(Clone, Debug, CandidType, Deserialize)]
struct Subscriber {
    topic: String,
}

#[update]
fn subscribe(subscriber: Subscriber) {
    let subscriber_principal_id = ic_cdk::caller();
    SUBSCRIBERS.with(|subscribers| {
        subscribers
            .borrow_mut()
            .insert(subscriber_principal_id, subscriber)
    });
}

#[update]
async fn publish(counter: Counter) {
    SUBSCRIBERS.with(|subscribers| {
        // This example is explicitly ignoring the error.
        for (k, v) in subscribers.borrow().iter() {
            if v.topic == counter.topic {
                let _call_result: Result<(), _> =
                    ic_cdk::notify(*k, "update_count", (&counter,));
            }
        }
    });
}
```
Furthermore, for simplicity we'll hardcode the owner of the wallet. If you want to test this example interactively, you can set it to your own principal that you can obtain using `dfx identity get-principal`.

In this code, you can see two inter-canister update calls: `fn subscribe(subscriber: Subscriber)` and `async fn publish(counter: Counter)`. The first method allows for the subscriber canister to make a call to the publisher canister and subscribe to topics. The second method allows the publisher canister to publish information on a topic in the subscriber canister.

```rust title="src/subscriber/src/lib.rs"
use candid::{CandidType, Principal};
use ic_cdk::{update, query};
use serde::Deserialize;
use std::cell::Cell;

thread_local! {
    static COUNTER: Cell<u64> = Cell::new(0);
}

#[derive(Clone, Debug, CandidType, Deserialize)]
struct Counter {
    topic: String,
    value: u64,
}

#[derive(Clone, Debug, CandidType, Deserialize)]
struct Subscriber {
    topic: String,
}

#[update]
async fn setup_subscribe(publisher_id: Principal, topic: String) {
    let subscriber = Subscriber { topic };
    let _call_result: Result<(), _> =
        ic_cdk::call(publisher_id, "subscribe", (subscriber,)).await;
}

#[update]
fn update_count(counter: Counter) {
    COUNTER.with(|c| {
        c.set(c.get() + counter.value);
    });
}

#[query]
fn get_count() -> u64 {
    COUNTER.with(|c| {
        c.get()
    })
}
```rust reference
https://github.com/oggy-dfin/icc_rust_docs/blob/34f59ddae9fcc70173fc21927a4279757f93c51a/src/icc_rust_docs_backend/src/lib.rs#L10-L11
```

In this code, there are three main methods: two inter-canister update methods and a query method.

The first method, `async fn setup_subscribe(publisher_id: Principal, topic: String)` provides functionality for the publisher canister to subscribe to topics within the `subscriber` canister. This function is called by the publisher canister.
## Basic ICP ledger transfer: unbounded wait calls

The second method, `fn update_count(counter: Counter)` updates the counter record for each published value in a topic within the subscriber canister.
The simplest way to interact with the ICP ledger is to use calls where the caller is willing to wait for the response for an unbounded amount of time. These calls can still fail before reaching the ledger, or even while the ledger is processing the call. but the ledger's response is guaranteed do be delivered to the caller, which is why we also refer to these calls as *guaranteed response* calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not start off with unbounded wait here but rather informally say what unbounded wait means.

In the second part I think it doesn't become fully clear that the property is that the system guaranteed to deliver the globally unique response.

I made a suggestion.

Suggested change
The simplest way to interact with the ICP ledger is to use calls where the caller is willing to wait for the response for an unbounded amount of time. These calls can still fail before reaching the ledger, or even while the ledger is processing the call. but the ledger's response is guaranteed do be delivered to the caller, which is why we also refer to these calls as *guaranteed response* calls.
The simplest way to interact with the ICP ledger is to use calls where the caller is willing to wait as long it takes the receiver canister and/or the system to respond. Note that unbounded wait calls can still fail before reaching the ledger, or even while the ledger is processing the call. The property these calls guarantee is that there is globally only one response to each request and it is guaranteed that this response will be delivered to the caller, which is why these requests are sometimes also called *guaranteed response* calls. So, for example, if the system produces a reject because the request can't be delivered for some reason, one can conclude that the canister never saw the message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start the explanation of guaranteed response calls with "The caller is guaranteed to learn the outcome of its call", and then move on to failure modes of requests.


The third method, `fn get_count() -> u64` allows the `Counter` value to be queried and returned in a call.
```rust reference
https://github.com/oggy-dfin/icc_rust_docs/blob/34f59ddae9fcc70173fc21927a4279757f93c51a/src/icc_rust_docs_backend/src/lib.rs#L14-L81
```

## Deploying the canisters
The guaranteed response delivery property ensures that the only causes of `StateUnknown` error are due to canister behavior (mismatched expectations on the return types, or the caller panicking). However, there is also a downside to unbounded wait calls: since safely upgrading a canister can only be done when the canister has no pending calls, and since unbounded wait calls provide no bound on when the call will return, the caller may be prevented from upgrading safely. This can in particular be problematic when calling untrusted canisters, such as an arbitrary ledger.

Now that you've taken a look at your canisters, let's deploy them.
We will next show how to use *bounded wait* calls instead in such cases. These calls don't guarantee that the response will be delivered, which is why we also refer to the as *best-effort response calls*, but in return don't block the caller from upgrading, and they also tolerate high system load better than unbounded wait calls. See the section on [inter-canister calls](/docs/current/developer-docs/smart-contracts/advanced-features/async-code) for more information on best-effort vs. guaranteed response calls.
oggy-dfin marked this conversation as resolved.
Show resolved Hide resolved

Open a terminal window on your local computer, if you don’t already have one open.
## ICRC-1 transfers: bounded wait calls

Then run the commands:
We will now allow our wallet to transfer tokens on an arbitrary ICRC-1 ledger instead of just the ICP ledger. Since we in general can't trust an arbitrary ICRC-1 ledger, and we want to ensure that our canister can always be upgraded, we will use bounded wait calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an untrusted ledger may not be a very good example. If I store my tokens in a ledger I have to trust it.

One option could be that we imagine a wallet app where users can register ledgers, meaning that -- while the users registering them might be perfectly fine with trusting the ledgers -- the app itself might not be. But this might already be too complex of an example.


```bash
dfx start --clean --background
dfx deploy
```
Ledgers generally charge fees for transfers. While this fee is fixed for the ICP ledger, it may vary for other ledgers. Thus, we start with an example of how to determine the required fee.

## Making inter-canister calls
### Learning the transfer fee

First, let's subscribe to a topic. For example, to subscribe to the "Apples" topic, use the command:
Querying the transfer fee does not change the ledger state. Thus, it's simple to retry in case that it fails, and the code below implements basic retries.

```bash
dfx canister call subscriber setup_subscribe '(principal "<INSERT_PUBLISHER_PRINCIPAL_HERE>", "Apples")'
```rust reference
https://github.com/oggy-dfin/icc_rust_docs/blob/34f59ddae9fcc70173fc21927a4279757f93c51a/src/icc_rust_docs_backend/src/lib.rs#L83-L134
```

Then, to publish a record to the "Apples" topic, use the command:
As noted in the example, the code after the first call executes in a different callback. See the sections on [inter-canister calls and async code](/docs/current/developer-docs/smart-contracts/advanced-features/async-code), [properties of call execution](/docs/current/developer-docs/security/security-best-practices/inter-canister-calls) and [security best practices](docs/current/developer-docs/security/security-best-practices/inter-canister-calls) to understand potential security implications for your application when using inter-canister calls.

```bash
dfx canister call publisher publish '(record { "topic" = "Apples"; "value" = 2 })'
```
### Transferring tokens

Then, you can query and receive the subscription record value with the command:
When transferring tokens (or performing other updates) using bounded wait messages, we need to handle the unknown state case. For ICRC-1 transfers, we can make use of the built-in deduplication feature of ICRC-1 ledgers and retry the call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe link to an explanation of the semantics of this deduplication? Or should we maybe briefly explain it here?


```bash
dfx canister call subscriber get_count
```rust reference
https://github.com/oggy-dfin/icc_rust_docs/blob/34f59ddae9fcc70173fc21927a4279757f93c51a/src/icc_rust_docs_backend/src/lib.rs#L136-L211
```

The output should resemble the following:
## Exchange rate canister: attaching cycles

```bash
(2 : nat64)
For our final example, we will use the [exchange rate canister](/docs/current/developer-docs/defi/exchange-rate-canister/) (XRC) to determine the exchange rate between assets, including tokens, but also currencies. The XRC uses [HTTP outcalls](/docs/current/developer-docs/smart-contracts/advanced-features/https-outcalls/https-outcalls-overview) to determine the exchange rate. Similar to ledgers charging transfer fees, the XRC charges a fee to the caller to determine the exchange rate. However, since the XRC doesn't have a token of its own, the XRC fee is paid in cycles rather than a token. The user has to attach cycles to such a call.

```rust reference
https://github.com/oggy-dfin/icc_rust_docs/blob/34f59ddae9fcc70173fc21927a4279757f93c51a/src/icc_rust_docs_backend/src/lib.rs#L213-L247
```

As noted in the example, for transferring larger amounts of cycles, switch to using unbounded wait calls. Bounded wait calls run the risk of losing cycles; see the section on [inter-canister calls](/docs/current/developer-docs/smart-contracts/advanced-features/async-code) for more details.
Original file line number Diff line number Diff line change
Expand Up @@ -309,29 +309,33 @@ Finally, note that the same guard can be used in several methods to restrict par

### Security concern

As stated by the [Property 6](/docs/current/references/message-execution-properties#message-execution-properties), inter-canister calls can fail in which case they result in a **reject**. See [reject codes](/docs/current/references/ic-interface-spec#reject-codes) for more detail. The caller must correctly deal with the reject cases, as they can happen in normal operation, because of insufficient cycles on the sender or receiver side, or because some data structures like message queues are full.
As stated by the [Property 6](/docs/current/references/message-execution-properties#message-execution-properties), inter-canister calls can fail in which case they result in a **reject**. See [reject codes](/docs/current/references/ic-interface-spec#reject-codes) for more detail. The caller must correctly deal with the reject cases, as they can happen in normal operation, because of insufficient cycles on the sender or receiver side, or even for reasons outside of the sender's or receiver's control, like the system (Internet Computer) being under heavy load (e.g., message queues becoming full).

Not handling the error cases correctly is risky: For example, if a ledger transfer results in an error, the callback dealing with that error must interpret it correctly. That is, it must be interpreted as "the transfer did not happen".
Not handling the reject cases correctly is risky: For example, if a ledger transfer results in a reject, the callback dealing with that error must interpret it correctly. That is, it should be interpreted as "the transfer did not happen", unless:

1. the call was issued as a best-effort response call, and the system responded with a `SYS_UNKNOWN` reject code. In this case, the caller cannot be a priori sure whether the call took effect or not.
2. the system responded with a `CANISTER_ERROR` reject code. This indicates a bug in the ledger canister. In this case, it is still possible that the call had a partial effect on the ledger canister.
3. the system responded with a `CANISTER_REJECT` reject code. This means that the call was explicitly rejected by the ledger canister. Normally, this indicates that the transfer didn't happen, but this depends on the ledger canister. The ICP ledger canister for example never rejects calls explicitly.

### Recommendation

When making inter-canister calls, always handle the error cases (rejects) correctly. These errors imply that the message has not been successfully executed.
When making inter-canister calls, always handle the error cases (rejects) correctly. Other than the `SYS_UNKNOWN` error code, these errors imply that the message has not been successfully executed. For `SYS_UNKNOWN`, follow the guidelines in the [safe retries & idempotency](/docs/current/developer-docs/smart-contracts/best-practices/idempotency) document to handle this scenario correctly.

## Be aware of the risks involved in calling untrustworthy canisters

### Security concern

- If inter-canister calls are made to potentially malicious canisters, this can lead to DoS issues or there could be issues related to candid decoding. Also, the data returned from a canister call could be assumed to be trustworthy when it is not.

- When another canister is called with a callback being registered, and the receiver stalls the response indefinitely by not responding, the result would be a DoS. Additionally, that canister can no longer be upgraded if it has callbacks registered. Recovery would require wiping the state of the canister by reinstalling it. Note that even a trustworthy canister could have a bug causing it to stall indefinitely. However, such a bug seems rather unlikely to occur.
- When a canister `C1` calls a canister `C2` using a guaranteed-response inter-canister call, and `C2` stalls the response indefinitely by not responding, the result would be a DoS on `C1`. Additionally, since the call registers a callback on `C1`, `C1` can no longer be stopped because of the outstanding callback, and thus can no longer be cleanly upgraded. Recovery would require wiping the state of the canister by reinstalling it. Note that even if `C2` was trustworthy it could still stall indefinitely. This could happen due to a bug in`C2` (which is rather unlikely to occur). But other causes could be a stall of the subnet hosting `C2` (assuming that `C1` and `C2` are on different subnets), or `C2` making a downstream call to an untrusted canister `C3`.

- In summary, this can DoS a canister, consume an excessive amount of resources, or lead to logic bugs if the behavior of the canister depends on the inter-canister call response.

### Recommendation

- Making inter-canister calls to trustworthy canisters is safe, except for the rather unlikely case that there is a bug in the callee that makes it stall forever.
- Making inter-canister calls to trustworthy canisters is safe, except for the rather unlikely case that there is a bug in the callee or its subnet that makes it stall forever.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

I would leave out "a bug in [...] its subnet", unless you mean "a temporarily stalled subnet".


- Interacting with untrustworthy canisters is still possible by using a state-free proxy canister which could easily be re-installed if it is attacked as described above and is stuck. When the proxy is reinstalled, the caller obtains an error response to the open calls.
- Interacting with untrustworthy canisters is still possible by using best-effort response calls, which cannot be stalled by the recipient. In particular, when using calls that do not change the callee's state (e.g., just fetching information), prefer using best-effort response calls. Another option is using guaranteed response calls through a state-free proxy canister which could easily be re-installed if it is attacked as described above and is stuck. When the proxy is reinstalled, the caller obtains an error response to the open calls.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point the proxy canister is a poor man's best-effort call. Which is I guess what it always was. I don't know whether it's still worth mentioning as an option.


- Sanitize data returned from inter-canister calls.

Expand All @@ -348,7 +352,7 @@ Loops in the call graph (e.g. canister A calling B, B calling C, C calling A) ma

### Recommendation

- Avoid such loops.
- Avoid such loops, or rely on best-effort response calls instead, since these provide timeouts.

- For more information, see [current limitations of the Internet Computer](https://wiki.internetcomputer.org/wiki/Current_limitations_of_the_Internet_Computer), section "Loops in call graphs".

Loading