Investigate better bitpacking for Operand and Use #5

cfallin · 2021-09-01T01:31:51Z

Two core data-structure elements, Operand and Use, are both designed to fit a relatively large amount of information in one u32. This is a performance optimization that we have found to be relatively impactful; expanding even to a u64 has a measurable impact (of at least a few percent) on compilation time.

Unfortunately, the scarcity of bits means that certain limits are lower than we would prefer. For example, we support only a 5-bit index for physical registers in each register class (so 32 integer registers and 32 float/vector registers), which may not be enough for some use-cases (though it can work for aarch64 and x64 at least). This also limits the VReg count to 1M (2^20).

We should investigate ways of, e.g., out-of-lining infrequently-used information (such as fixed-PReg constraints) to raise the limits on VRegs, PRegs, instruction count, and the like and provide enough headroom for any reasonably-imaginable use case.

The text was updated successfully, but these errors were encountered:

Amanieu · 2021-09-03T21:33:15Z

When we remove non-SSA support (#4), we can shrink OperandKind to 1 bit instead of 2. Also, we can change the encoding of constraint + preg to take only 6 bits instead of 8:

1xxxxx => Fixed(preg)
01xxxx => Reuse(index)
000000 => Any
000001 => Reg
000010 => Stack
_ => Free for future use

This bumps the limit for the VReg index to 8M (2^23). However I don't see how we can fit explicit stack slots into this scheme.

To go even further, we would need to store auxillary Operand information in a separate u8. By moving preg/index into the second word, we would leave 26 bits for the VReg index which allows for up to 64M VRegs (the remaining 6 bits are: kind, pos, class and constraint:3). The size of Use wouldn't increase since we have a padding byte to spare. This also allows us to have up to 256 PRegs per class instead of 32, 256 explicit stack slots, etc.

struct OperandExtra(u8);

trait Function {
    // Both slices must have the same length.
    fn inst_operands(&self, insn: Inst) -> (&[Operand], &[OperandExtra]);
}

cfallin · 2021-09-07T17:09:03Z

Allowing a second u32, even, tagged with an instruction index, seems like it could be a reasonable approach to allow more information to be provided in "rare" cases (stackslots, fixed pregs). I guess at this point we're really designing a variable-length compression scheme so we should be somewhat data-driven on that question, of course.

The same approach could be used for Uses as for Operands, though any tagging of the extra word to associate with an index in the main array would have to be re-encoded to refer to use-indices per LiveRange.

As a separate thought: it could be worth measuring the runtime overhead of a fixed number of extra bits instead. It's certainly the preferable approach if cheap enough; split data structures will complexify everything else. IIRC, I did try a middle design point of 48 bits per Use -- u32 and u16 in a packed struct -- and that still had a measurable impact at one point, but possibly things have changed. (I don't have time to do this at the moment but figured I should note it for when this is considered next.)

cfallin · 2021-09-07T17:14:42Z

Ah, I found it -- I did some measurements here and found a 3% overhead when using a 64-bit Operand. That's not great but it does upper-bound things (it's the simplest solution here, everything else we've proposed is more compact). Maybe someone could measure what a 48-bit Operand would do...

Amanieu · 2023-05-23T15:56:28Z

I ran a benchmark to estimate the overhead of 64-bit Operand by adding a pad: u32 field to Operand. I was not able to find any measurable performance difference, although my results were a bit noisy.

Could you give this a try with Cranelift? If you also see no measurable difference, then I will prepare a PR to increase the Operand size to 64-bit, which will also resolve #131.

cfallin · 2023-05-23T16:24:56Z

@Amanieu I'm a bit swamped at the moment but you might be able to run this yourself? I would suggest tackling the noise issue also, at least for this decision: we've had false results before where a 1% shift hides within noise, or appears out of nowhere (i.e., false positives and negatives), on systems where we didn't explicitly isolate a core, pin frequency, and steer interrupts away. @jameysharp wrote up a good doc here on what we do. You don't have to figure out Sightglass necessarily, I'd consider a hyperfine './wasmtime.base compile file.wasm' './wasmtime.new compile file.wasm' adequate for this.

I will say that I did see a shift of several percent when I tried this... 1.5 years ago now?... and so while a lot has changed since then and it could be "free-ish" now, the onus of extra-careful proof is on the proposed change here.

One other experiment that would be useful, to evaluate the measurement setup, is to try to swing the needle further and see what the sensitivity is: add extra padding to Operand to make it 12 bytes, 16 bytes, etc up to something absurd (64 bytes?). If the measurement setup does start to show e.g. 1% at 16 bytes then that gives better confidence of small impact at 8 bytes (64 bits).

cfallin mentioned this issue Sep 10, 2021

Support explicit stack slots #12

Closed

Amanieu mentioned this issue Sep 13, 2021

Adjust Operand encoding #13

Merged

Amanieu mentioned this issue May 2, 2023

Extend integer register PReg space to 128 registers #131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate better bitpacking for Operand and Use #5

Investigate better bitpacking for Operand and Use #5

cfallin commented Sep 1, 2021

Amanieu commented Sep 3, 2021

cfallin commented Sep 7, 2021

cfallin commented Sep 7, 2021

Amanieu commented May 23, 2023

cfallin commented May 23, 2023

Investigate better bitpacking for Operand and Use #5

Investigate better bitpacking for Operand and Use #5

Comments

cfallin commented Sep 1, 2021

Amanieu commented Sep 3, 2021

cfallin commented Sep 7, 2021

cfallin commented Sep 7, 2021

Amanieu commented May 23, 2023

cfallin commented May 23, 2023