Account ID network identifier and checksum #1024

PhilippGackstatter · 2024-12-17T09:36:33Z

As a follow-up to the AccountID changes in #140, we can consider adding a user-facing encoding for account IDs. The purpose of this encoding should be

Network Identifier: to identify the network (e.g. mainnet, testnet, ...) the ID belongs to.
Checksum: include a checksum to protect against typos when typed manually or spoken aloud.

The reason for not including these in the account ID directly is to keep the ID as small as it can be, since it will be used within the VM. Moreover, these two properties can be ensured at some point higher in the stack (e.g. in RPC APIs). Here the encoded ID can be verified to be for the right network, that its checksum is correct and then unwrapped into the actual account ID. From that point on, only the ID is passed around. This layered approach seems better and sufficient to keep the lower layers network agnostic.

Bech32

One option is to use bech32:

Bech32 is an encoding scheme that is easy to use for humans and efficient to encode in QR codes.
A Bech32 string consists of a human-readable part (HRP), a separator (the character '1'), and a data part. A checksum at the end of the string provides error detection to prevent mistakes when the string is written off or read out loud.
https://docs.rs/bech32/0.11.0/bech32/

Bitcoin previously used base58 encoding and then moved to bech32, though I think both are still in use.

The motivation for using bech32 over base58 is described in the motivation section of the BIP: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#motivation

The following examples use the bech32 crate with the latest Bech32m checksum algorithm from https://github.com/bitcoin/bips/blob/master/bip-0350.mediawiki. So bech32 has the Checksum property built in.

Its human-readable part gives us the Network Identifier. For example, using "miden" as the HRP for mainnet, an account ID would look like so:

account id hex: 0x7f6918607b80ee900000b3b8dbc713
bech32 encoded: miden1jrpduyc7nxd6cjjrvnlvgqqqwl6ux5

This is the printed account ID encoded in bech32.

One optional thing we could do is prepend an additional byte at the beginning of the byte encoded account ID. This first byte could be defined as account_id_version * 8. Choosing a multiple of 8 means the first character after the 1 separator is different. This means one can tell apart different account ID versions by looking at this first character.

Each version would then start with a unique character:

bech32 IDv0: miden1qzgrm...
bech32 IDv1: miden1pzgrm...
bech32 IDv2: miden1zzgrm...
bech32 IDv3: miden1rzgrm...
...

This might be nice to have, but again, optional.

Overall bech32 gives us the checksum, a network identifier of our choice and an implementation via the bech32 crate which seems well-maintained and with significant usage (i.e. little risk of loosing maintenance). Bech32 appears well thought through and matches our use case very closely (ID/address encoding).

Base 58 Check

There is a base58 check implementation which appears maintained: https://docs.rs/base58ck/0.2.0/base58ck/
Although with the arguments given in the above motivation for bech32 I'm not sure it's worth using. We'd add a dependence on SHA-256 with this. On the plus side the encoded address is shorter.

account id: 0x38d4f3f0c2abee900000a78b93c417
base58ck encoded: 54BCnWfUUAFiYBtW3DCaT

Hex Encoding with checksum

We can also make our own encoding in hex and add a byte for the network identifier and a custom checksum (e.g. using the crc crate), but we'd have to do more significant engineering work to ensure its correctness and error detection capability over a more battle-tested solution.

Other Options

Are there other good options I've missed? If you have any thoughts or opinions on this, feel free to comment.

The text was updated successfully, but these errors were encountered:

bobbinth · 2025-01-29T17:37:33Z

I think we should probably go either directly with bech32 or maybe with some slightly modified variant thereof.

One optional thing we could do is prepend an additional byte at the beginning of the byte encoded account ID. This first byte could be defined as account_id_version * 8. Choosing a multiple of 8 means the first character after the 1 separator is different. This means one can tell apart different account ID versions by looking at this first character.

I like this, but we could probably just move the whole last byte of the prefix to the front (and maybe pad with two zeros) - this way, for a given version/type/storage mode, the first two characters will remain stable.

Its human-readable part gives us the Network Identifier. For example, using "miden" as the HRP for mainnet, an account ID would look like so:
account id hex: 0x7f6918607b80ee900000b3b8dbc713
bech32 encoded: miden1jrpduyc7nxd6cjjrvnlvgqqqwl6ux5

It would be great for the human-readable part to include two things:

network - e.g., mainnet, testnet, devent
entity type - account ID vs. something else.

Regarding the second point: right now we deal only with account IDs, but there could be other types of "addresses" in the future that we may want to encode. For example, we could have a pay-to-Falcon-public-key (P2FK) note which could be consumed by anyone who can provide a signature against some Falcon public key.

At the same time, we probably don't want to make the HRP too long. One option is to make it very compact. For example, use m for mainnet, t for testnet, d for devent. And for entity type use ai for account ID, and in the future, pk for public key. So, then we'd end up with something like:

mai1...  # account ID on mainnet
tai1...  # account ID on testent
mpk1...  # public key on mainnet (this would also be longer - i.e., more than 32 bytes)

PhilippGackstatter · 2025-01-30T14:47:48Z

That's a good point. It would be great to make it extensible so we can add more address types in the future.

One of the issues with using a very compact hrp is that it may easily conflict with one of the registered HRPs (https://github.com/satoshilabs/slips/blob/master/slip-0173.md) and I think we should try to avoid that. To be fair, it looks like none of your suggestions currently collide, but with this compact scheme it seems likely this could happen as more variants are added. Also, if we want to register ours in that registry (which we might want to do), then we'd have a lot of them to register.

Since the address is one of the most end-user-facing things there is, I think simplicity and readability are the most important thing. Ideally it should be very easy for a user to determine whether an address is for Miden (mainnet) and not any other blockchain or Miden testnet/devnet. For that reason, I think we should avoid encoding things into the hrp as it won't have any meaning for users and because it doesn't seem to be intended for that use case. Instead we can encode this in the data part of bech32.

So I would still prefer "miden" for mainnet and some other ones for testnet and devnet, because that would give users some confidence that they have the correct address. I would be totally fine with "miden_test" or "miden_dev" or something in that direction, because it would be very clear at first glance that this is not a mainnet address.

Address Type in Data Part

For the address type we could prepend it to the data part, again making it a multiple of 8, for example:

#[repr(u8)]
pub enum AddressType {
    AccountId = 0,
    FalconKey = 8,
}

and then an account ID will have a q as the first character after the 1 and a falcon key will have a p:

bech32 account id: miden1qq8h6emx7jxvfgqqqpkgke4h5qyhgu68
bech32 falcon pk:  miden1pqgze6kkq7c9pnvgadj7xkc2ahe49a6rk4y220s8770l3cddhjhl5e7j9e7

That way, those who know (devs) can still distinguish the address types but for end-users it'll be just part of the address.
Contrary to the original suggestion I would then leave the account ID exactly as it is and not move the metadata byte anywhere.

Is there a reason why end-users would have to know what type of an address it is and therefore why it perhaps should be part of the HRP?

Address Type in HRP

Something I would find less appealing but could also be fine with is if we append a single character after the network identifier to encode address types. The hrp is part of the checksum, so we don't loose any safety there. This could look like this:

bech32 account id: midena1pa7kweh53nz2qqqqdj9kddaqw8pryv
bech32 falcon pk:  midenf1zqkw44s8kpgvmz8tvh34kzhd7df0wsa4fzjnuplhnluwrtdu4laqx8trsx

But this would be pretty much the same as encoding it in the data part, and it looks less clean to me because it reads "midena" rather than "miden".

bobbinth · 2025-01-30T21:50:26Z

One of the issues with using a very compact hrp is that it may easily conflict with one of the registered HRPs (https://github.com/satoshilabs/slips/blob/master/slip-0173.md) and I think we should try to avoid that. To be fair, it looks like none of your suggestions currently collide, but with this compact scheme it seems likely this could happen as more variants are added. Also, if we want to register ours in that registry (which we might want to do), then we'd have a lot of them to register.

Ah - I didn't know about this registry, but it seems like it still may be OK to use a compact identifier. For example, Bitcoin uses bc and there are a bunch of others using 2 - 3 letter symbols. Also, not sure if we need to register testnet/devent prefixes - probably mainnet should be fine.

So, I'm thinking for HRP we could use:

mm for mainnet.
mts for testnet.
mdv for devent.

As far as I can tell, none of these are currently used.

Since the address is one of the most end-user-facing things there is, I think simplicity and readability are the most important thing. Ideally it should be very easy for a user to determine whether an address is for Miden (mainnet) and not any other blockchain or Miden testnet/devnet. For that reason, I think we should avoid encoding things into the hrp as it won't have any meaning for users and because it doesn't seem to be intended for that use case. Instead we can encode this in the data part of bech32.

Yes, good point - let's move the "address type" into the data portion. I also like the idea of using a single letter to define address type - i.g., p for account ID and q for public key (the type of the key could be encoded separately).

Contrary to the original suggestion I would then leave the account ID exactly as it is and not move the metadata byte anywhere.

I would maybe still move the byte upfront (to go right after the address type character).

So, putting this all together, we could have:

mm1q...   # account ID on mainnet
mm1p...   # public key on mainnet
mts1q...  # account ID on testent

One other thing that came to mind - is there actually high enough value in distinguishing between mainnet/testnet/devent address? It seems to me that distinguishing between them may come with significant code complexity. Maybe just using mm everywhere will be fine?

Mirko-von-Leipzig · 2025-01-31T07:11:42Z

One other thing that came to mind - is there actually high enough value in distinguishing between mainnet/testnet/devent address? It seems to me that distinguishing between them may come with significant code complexity. Maybe just using mm everywhere will be fine?

I think there is user value. People do often cross-contaminate addresses so removing a common mistake could be worth it.

We should be able to find a nice abstraction for this on the node side I think. Is there actually a way to identify the node's network? Right now the node doesn't know what network it is. Since this is protocol level this means the protocol needs to be somewhat network aware then?

@PhilippGackstatter would this scheme allow for custom networks? As in, how do I run a local network, or even a public unofficial one? Would it have to masquerade as a main/test/devnet? Or is there some mapping function given some name/network string?

bobbinth · 2025-01-31T07:18:12Z

I think there is user value. People do often cross-contaminate addresses so removing a common mistake could be worth it.

Agreed there is value. I was thinking more of whether it outweighs the complexities on the code side. For example, i'm imagining that we'd add something like AccountId::to_bech32() function. For this function to produce different addresses depending on the network, we have 2 options:

Pass an enum into the function. This will probably be messy as it would require any other code calling this function to be aware of the current network.
Use a feature flag so that we could specify the network we are compiling the code for (with default perhaps being the mainnet). This is a bit better than the first option, I think, but could also lead to some miss-configurations.

PhilippGackstatter · 2025-01-31T07:44:43Z

I think there is user value. People do often cross-contaminate addresses so removing a common mistake could be worth it.

Agreed as well. There are cases where people sent funds to testnet addresses and lost them.

@PhilippGackstatter would this scheme allow for custom networks? As in, how do I run a local network, or even a public unofficial one? Would it have to masquerade as a main/test/devnet? Or is there some mapping function given some name/network string?

I assumed we would eventually need a ProtocolConfig anyway, and the HRP would be one thing that goes into this. I assume we need such a config to specify the current protocol version, and down the road, enable coordinated upgrades of the protocol to new versions.
Basically at least two categories of things go in there:

Any parameter we want to be able to evolve over time, like the proof security level we use for proof verification.
Anything that is different from mainnet/testnet/devnet would go in there, like the hrp or "network name".

While we run in a centralized setting, 1. is probably not essential, since we have full control so using constants is fine, as we can just recompile. But eventually you'd want to be able to specify two protocol versions at the same time and define a block heigh at which you switch to the new one.
Even now, for 2. it would be good to have this.

For a given network, say mainnet or a "localhost network" we would initialize the node with a protocol config. The client would need to fetch this from the node and use it in various places, one of them being when encoding/decoding an AccountId to/from bech32. It would definitely add complexity, but I think the node and client need to be parameterized over certain things anyway, and this is one way to do it. Every protocol I know of has one of these so not sure there's a way around in the long-run.

2. Use a feature flag so that we could specify the network we are compiling the code

I think this wouldn't work for the webClient and any other bindings afaict, as typically, users can't change the feature flags. Plus that would imply they need a rust toolchain.

So unfortunately, I think we need to make this a runtime configuration.

Mirko-von-Leipzig · 2025-01-31T07:47:03Z

I think there is user value. People do often cross-contaminate addresses so removing a common mistake could be worth it.

Agreed there is value. I was thinking more of whether it outweighs the complexities on the code side.
Yeah that is fair. It will add network enum into every dApp as well. I would argue that at least for the node we already want things like this. e.g. we should verify that the components all have the same network ID by comparing against the store, and the store comparing against the database on startup.

Starknet does this slightly differently by adding the network ID into tx hashes. To facilitate this, the RPC also has a get_chain_id method for dApps to pull from. This probably doesn't help us directly though since account IDs can still just be wrong.

I can't really evaluate the trade-off.

For example, i'm imagining that we'd add something like AccountId::to_bech32() function. For this function to produce different addresses depending on the network, we have 2 options:

Pass an enum into the function. This will probably be messy as it would require any other code calling this function to be aware of the current network.

Use a feature flag so that we could specify the network we are compiling the code for (with default perhaps being the mainnet). This is a bit better than the first option, I think, but could also lead to some miss-configurations.

I would definitely prefer option (1) 😅 Controlling execution via feature flags is almost never the right thing imo. The testing feature controlling difficulty was already an issue.

I imagine to_bech32 would only be called at the extremes of a program? So e.g. in the RPC component outputs. And similarly for from_bech32 where it would need to verify the network ID.

bobbinth added this to the v0.8 milestone Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Account ID network identifier and checksum #1024

Account ID network identifier and checksum #1024

PhilippGackstatter commented Dec 17, 2024

bobbinth commented Jan 29, 2025 •

edited

Loading

PhilippGackstatter commented Jan 30, 2025

bobbinth commented Jan 30, 2025

Mirko-von-Leipzig commented Jan 31, 2025 •

edited

Loading

bobbinth commented Jan 31, 2025

PhilippGackstatter commented Jan 31, 2025

Mirko-von-Leipzig commented Jan 31, 2025

Account ID network identifier and checksum #1024

Account ID network identifier and checksum #1024

Comments

PhilippGackstatter commented Dec 17, 2024

Bech32

Base 58 Check

Hex Encoding with checksum

Other Options

bobbinth commented Jan 29, 2025 • edited Loading

PhilippGackstatter commented Jan 30, 2025

bobbinth commented Jan 30, 2025

Mirko-von-Leipzig commented Jan 31, 2025 • edited Loading

bobbinth commented Jan 31, 2025

PhilippGackstatter commented Jan 31, 2025

Mirko-von-Leipzig commented Jan 31, 2025

bobbinth commented Jan 29, 2025 •

edited

Loading

Mirko-von-Leipzig commented Jan 31, 2025 •

edited

Loading