Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0178: SBPF Static Syscalls #178

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions proposals/0178-static-syscalls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
simd: '0178'
title: SBPF Static Syscalls
authors:
- Alessandro Decina
- Alexander Meißner
- Lucas Steuernagel
category: Standard
type: Core
status: Review
created: 2024-09-27
---

## Summary

This SIMD introduces a new instruction syscall in the SBPF instruction set to
represent syscalls. Such a change aims to remove relocations when resolving
syscalls and simplify the instruction set, allowing for the straightforward
differentiation between external and internal calls. In addition, it proposes
a new `return` instruction to supersede the `exit` instruction.

## Motivation

The resolution of syscalls during ELF loading requires relocating addresses,
which is a performance burden for the validator. Relocations require an entire
copy of the ELF file in memory to either relocate addresses we fetch from the
symbol table or offset addresses to after the start of the virtual machine’s
memory. Moreover, relocations pose security concerns, as they allow the
arbitrary modification of program headers and programs sections. A new
separate opcode for syscalls modifies the behavior of the ELF loader, allowing
us to resolve syscalls without relocations.
Comment on lines +24 to +31

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉


## New Terminology

None.

## Detailed Design

The following must go into effect if and only if a program indicates the SBPF
version `0x03` or higher in its ELF header e_flags field, according to the
specification of SIMD-0161.

### New syscall instruction

We introduce a new instruction in the SBPF instruction set, which we call
`syscall`. It must be associated with all syscalls in the SBPF format. Its
encoding consists of an opcode `0x95` and an immediate, which must refer to a
previously registered syscall. For more reference on the SBF ISA format, see
the
[spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md).

For simplicity, syscalls must be represented as a natural number greater than
zero, so that they can be organized in a lookup table. This choice allows for
quick retrieval of syscall information from integer indexes. An instruction
`syscall 2` must represent a call to the function registered at position two
in the lookup table.

Consequently, system calls in the Solana SDK and in any related compiler tools
must be registered as function pointers, whose address is a natural number
greater than zero, representing their position in a syscall lookup table. The
verifier must enforce that the immediate of a syscall instruction points to a
valid syscall, and throw `VerifierError::InvalidSyscall` otherwise.

This new instruction comes together with modifications in the verification
phase. `call imm` (opcode `0x85`) instructions must only refer to internal
calls and its immediate field must only be interpreted as a relative address
to jump from the program counter.
Comment on lines +65 to +67

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean there is no longer a need to hash the immediates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does and is the intention.


### New return instruction

The opcode `0x9D` must represent the return instruction, which supersedes the
`exit` instruction. The opcode (opcode `0x95`), previously assigned to the
`exit` instruction, must now be interpreted as the new syscall instruction.
Comment on lines +71 to +73
Copy link

@topointon-jump topointon-jump Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind changing this?

Copy link

@topointon-jump topointon-jump Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, changing the name from exit to return when it is the same instruction could be confusing. I have already seen this confused in other SIMDs.

Copy link

@topointon-jump topointon-jump Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note - we should bundle large sets of proposed ISA changes together into the same SBPF version upgrade, so that clients don't have to support a mis-mash of ISAs based on feature flags. I believe this is the intent of #161, but just re-iterating 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation is that exit was occupying the slot in the instruction class for controlflow with immediate values and it does not take an immediate value. The new syscall opcode however does, so it took its place.


The verifier must detect an SBPF V1 program containing the `0x9D` opcode and
throw a `VerifierError::UnknowOpCode`. Likewise, if, by any means, a V1
program reaches the execution stage containing the `0x9D` opcode, an
`EbpfError::UnsupportedInstruction` must be raised.

### Syscall numbering convention
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer is not a great idea. SVM will continue to diverge more and more as non-mainnet SVM chains and L2s develop. If they wish to push their own syscalls, not only will they need to write their own tooling to handle it, if Solana mainnet adds another syscall that clashes and that same binary is shipped to another chain, or vice-versa, people could lose money. It would make way more sense to use the Murmur3 hash. If they decide to launch a hash collision, at least we tried to stop them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Murmur3 is what the original implementation did, and I was against switching to indexes, but then I got busy on other stuff (I just noticed that master was switched to indexes).

Is the best argument for indexes that lookup is faster? And if that's the argument, isn't it moot considering that we JIT?

Copy link
Author

@LucasSte LucasSte Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @deanmlittle,
Thanks for your input. I didn't follow why teams would have to develop their own tooling. On the Solana SDK, changing from consecutive integers to a murmur hash is simply assigning another constant to the function pointers. The compiler toolchain can deal with both cases without changes.

On the other hand, I partially agree that using consecutive numbers may hinder the development of SVM chains. In Agave, we considered that using a contiguous array would add to much complexity to the code just to handle inactive syscalls or deprecated ones, so we are still using a BTree, as there are only 40 syscalls to lookup for.

Agave's implementation does not prevent SVM external users from calculating the murmur32 hash for their own syscalls, as any 32-bit integer can be used for indexing. The numbering convention they use does not need to match ours, provided that the numbers don't coincide.

Using consecutive numbers was a request from Firedancer. I believe either @topointon-jump, @ripatel-fd or @0x0ece can elaborate more on the reasons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LucasSte The main argument is to optimize for bytecode decode efficiency. Interpreting a syscall instruction with a hash requires at least two memory accesses, having an index requires only one. That's a significant cost saving for an instruction that may be executed up to 1 billion times per second in the future.

@deanmlittle There's no need to lose ABI security protections or break cross-SVM program compatibility. This SIMD makes no mention of removal of the symbol table. Currently, the dynamic symbol table of each program maps syscall names to the syscall hash. It would make sense to redefined st_value to carry the new syscall ID in this SIMD.

Then, the ELF loader can trivially reject programs that have an unknown syscall name or mismatching ID. And the bytecode verifier should reject syscall invocations that weren't verified via the symbol table. ABIv2 proposed previously by Anza similarly moves checks to bytecode verification from later stages, so this wouldn't be out of line.

The other concern you brought up is compatibility. A public GH repo listing IDs and their users is a common way to solve the enum problem. (Examples of other projects doing this: https://github.com/multiformats/multicodec/blob/master/table.csv https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml)

@LucasSte Could you clarify in the SIMD whether static syscalls are verified during ELF loading or bytecode verification? If we won't have these measures in place to support enums, I agree with Dean that we should keep the hash instead.

Is the best argument for indexes that lookup is faster? And if that's the argument, isn't it moot considering that we JIT?

@alessandrod There is a case for allowing zero-copy execution out of a bytecode buffer (i.e. interpreter). With direct mapping, we've seen that the average per-instruction overhead for mainnet executions is so high that a JIT barely outperforms Firedancer's interpreter even when the compiled program is in program cache. Bytecode translation is more susceptible to DoS due to the high cost of allocating memory and JIT compiling when spam invoking cold programs. FWIW, we are beginning mainnet testing of the full Firedancer client too, which is interpreter-only.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alessandrod There is a case for allowing zero-copy execution out of a bytecode buffer (i.e. interpreter).

you don't have to fixup in place right?

I understand your perspective. From the anza POV tho, there would be no overhead in having hashes since we always JIT. From the devs POV, I think we can agree that having SVM compatibility is desirable if it doesn't come at a huge cost?

So is it worth making a protocol compromise to accommodate for fd's current implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify in the SIMD whether static syscalls are verified during ELF loading or bytecode verification? If we won't have these measures in place to support enums, I agree with Dean that we should keep the hash instead.

As @Lichtso, syscalls are no longer supposed to appear in the symbol table. During the loader initialization, we register every valid syscall and the verifier checks the program bytecode against them. This procedure is briefly described in lines 58 to 63.

I think we should do the central syscall registry repo regardless.

The registry repo is exactly what I've proposed we should do internally at Anza. An initial sketch of it is here: https://github.com/anza-xyz/agave/blob/master/sdk/define-syscall/src/codes.rs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for going back to murmur32 since it makes it easier to interoperate with "svm extensions"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a hard preference, so I fine reverting the spec to mumur32. @ripatel-fd and @Lichtso are you OK with it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, who are these alternative SVMs, L2s and syscalls?
(and btw, there's no requirement for the indexes to be continuous, an L2 could for example start from 50 to avoid collisions. Or they could also add the syscall to the L1 SVM since this is Solana and not Ethereum...)

Copy link
Contributor

@deanmlittle deanmlittle Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To name a few: Soon, SonicSVM, Atlas, Eclipse, and redacted. There are many use cases not well-served by mainnet-beta due to protocol development grinding to a halt for the past 2 years for obvious reasons. People are starting to innovate elsewhere, and that's actually a very good thing, as if any of these experiments gain adoption or unlock new use cases, we have the ability to adopt them later after being tested in the wild. Making an absolute mess of the enum by padding the indices may sound like a great idea today, but the 20 different versions of the enum depending upon the chain and the inevitable 50th mainnet syscall breaking everything will eventually prove otherwise. If cross-SVM compatibility is of any importance, the Murmur hash is a much better solution for avoiding collisions.


Syscalls must be represented by a unique integer to maintain a dense lookup
table data structure for indexing and dispatch. For a clear correlation
between the existing syscalls and their respective identification number,
syscalls must strictly follow the numbering below.

| Syscall name | Number |
|------------------------------------------|----------|
| abort | 1 |
| sol_panic_ | 2 |
| sol_memcpy_ | 3 |
| sol_memmove_ | 4 |
| sol_memset_ | 5 |
| sol_memcmp_ | 6 |
| sol_log_ | 7 |
| sol_log_64_ | 8 |
| sol_log_pubkey | 9 |
| sol_log_compute_units_ | 10 |
| sol_alloc_free_ | 11 |
| sol_invoke_signed_c | 12 |
| sol_invoke_signed_rust | 13 |
| sol_set_return_data | 14 |
| sol_get_return_data | 15 |
| sol_log_data | 16 |
| sol_sha256 | 17 |
| sol_keccak256 | 18 |
| sol_secp256k1_recover | 19 |
| sol_blake3 | 20 |
| sol_poseidon | 21 |
| sol_get_processed_sibling_instruction | 22 |
| sol_get_stack_height | 23 |
| sol_curve_validate_point | 24 |
| sol_curve_group_op | 25 |
| sol_curve_multiscalar_mul | 26 |
| sol_curve_pairing_map | 27 |
| sol_alt_bn128_group_op | 28 |
| sol_alt_bn128_compression | 29 |
| sol_big_mod_exp | 30 |
| sol_remaining_compute_units | 31 |
| sol_create_program_address | 32 |
| sol_try_find_program_address | 33 |
| sol_get_sysvar | 34 |
| sol_get_epoch_stake | 35 |
| sol_get_clock_sysvar | 36 |
| sol_get_epoch_schedule_sysvar | 37 |
| sol_get_last_restart_slot | 38 |
| sol_get_epoch_rewards_sysvar | 39 |
| sol_get_fees_sysvar | 40 |
| sol_get_rent_sysvar | 41 |
|------------------------------------------|----------|

## Alternatives Considered

None.

## Impact

The changes proposed in this SIMD are transparent to dApp developers. The
compiler toolchain will emit correct code for the specified SBF version.
Static syscalls obviate relocations for call instructions and move the virtual
machine closer to eliminating relocations altogether, which can bring
considerable performance improvements.

## Security Considerations

None.
Loading