You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What I am proposing is to introduce "array-call" calling convention, the name is just whatt I thought would fit it perfectly. The main goal of this feature is to allow implementing dynamically typed languages easier and making them more performant. The proposed feature should allow users of Cranelift to check argument count passed to a function on callee-side and fetch arguments dynamically akin accessing array. What it could look like is:
function u0:0(...) array_call {
v0 = get_argument_count.i32
v1 = icmp.lt v0, 2
brif v1, block0, block1
block0:
v2 = get_argument.i64 0 // "constant" fetch of argument, index is know so we can just map it to `rdi` on x64 for example.
v3 = iconst.i32 1
v4 = get_argument.i64 v3 // also allow to fetch arguments dynamically
block1:
...
}
Benefit
This allows for implementation of fast and efficient calls when compiling dynamically-typed languages where function arguments and function signature are not known in advance. One might argue that you can implement the same behavior by passing argc: usize, argv: *mut Value but it won't work in case of tail-calls and also is not as performant as directly passing arguments in register when opportunity presents itself.
One example where such calling convention can be used is Scheme compilers. There's many Scheme compilers in the wild which produce binaries and have to rely on their own backends simply because no other backend has "arraycall"-like calling-convention. In my own implementation I have to pass arguments on runstack which is stored in TLS state (uses pinned_reg feature to access the runstack though) and it's a huge performance hit compared to doing calls using arraycall-like calling convention in my baseline backend which compiles Scheme to assembly directly from bytecode.
Implementation
The implementation could be based on existing SystemV and TailCall calling conventions.
The following implementation is just an example and uses X64 as a base:
RAX can be used to pass argument count (or return value count) to callee/caller
RDI-R8 registers can be used for arguments.
As for arguments that do not fit into registers stack space should be allocated and cleaned up by callee on exit/return.
If argument is larger than platform word size we might disallow the usage of this convention as I am unsure how to mix different argument sizes well with this.
Alternatives
One alternative is passing argc: usize, argv: *mut Value. It works, works really well and is easy to implement from user-side. But this approach has downsides of not allowing tail-calls and degraded performance since now we're always forced to pass everything on the stack.
Second alternative is heap-allocated buffer stored somewhere in VM context and accessed through pinned_reg feature. It's similar to the first alternative but allows to perform tail-calls as now stack is not used across calls thus arguments are not overwritten during tail-calls. But the downside of degraded performance still remains.
The text was updated successfully, but these errors were encountered:
playX18
changed the title
Introduce ArrayCall calling convention
Cranelift: introduce ArrayCall calling convention
Nov 1, 2024
Can you clarify how a ptr/len pair won't work for tail calls?
The main idea and benefit here is that you are using registers, and not actually an array? (So array-call would be a bit of a misnomer.)
You should be able implement this calling convention on top of CLIF as it is today (albeit with a shadow stack rather than the native stack) by making every function have the following CLIF signature:
(i64, i64, i64, i64, i64, i64) -> i64 tail
Where the CLIF arguments are:
arg count
first argument, if present or else undefined
second argument, if present or else undefined
third argument, if present or else undefined
fourth argument, if present or else undefined
pointer into a shadow stack, containing additional arguments if necessary
The return value is the number of results, which themselves are pushed onto the shadow stack.
This approach has the following benefits:
You don't need to extend Cranelift at all, it can work today.
We don't need to maintain a new calling convention (that is additionally very different from our existing calling conventions and therefore won't be able to share too much code).
We don't need to introduce new CLIF instructions that most Cranelift users don't need.
You can tune the calling convention as you please without needing to go through updating upstream Cranelift.
Feature
What I am proposing is to introduce "array-call" calling convention, the name is just whatt I thought would fit it perfectly. The main goal of this feature is to allow implementing dynamically typed languages easier and making them more performant. The proposed feature should allow users of Cranelift to check argument count passed to a function on callee-side and fetch arguments dynamically akin accessing array. What it could look like is:
Benefit
This allows for implementation of fast and efficient calls when compiling dynamically-typed languages where function arguments and function signature are not known in advance. One might argue that you can implement the same behavior by passing
argc: usize, argv: *mut Value
but it won't work in case of tail-calls and also is not as performant as directly passing arguments in register when opportunity presents itself.One example where such calling convention can be used is Scheme compilers. There's many Scheme compilers in the wild which produce binaries and have to rely on their own backends simply because no other backend has "arraycall"-like calling-convention. In my own implementation I have to pass arguments on runstack which is stored in TLS state (uses
pinned_reg
feature to access the runstack though) and it's a huge performance hit compared to doing calls using arraycall-like calling convention in my baseline backend which compiles Scheme to assembly directly from bytecode.Implementation
The implementation could be based on existing SystemV and TailCall calling conventions.
The following implementation is just an example and uses X64 as a base:
RAX
can be used to pass argument count (or return value count) to callee/callerRDI
-R8
registers can be used for arguments.Alternatives
argc: usize, argv: *mut Value
. It works, works really well and is easy to implement from user-side. But this approach has downsides of not allowing tail-calls and degraded performance since now we're always forced to pass everything on the stack.pinned_reg
feature. It's similar to the first alternative but allows to perform tail-calls as now stack is not used across calls thus arguments are not overwritten during tail-calls. But the downside of degraded performance still remains.The text was updated successfully, but these errors were encountered: