-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate better bitpacking for Operand and Use #5
Comments
When we remove non-SSA support (#4), we can shrink
This bumps the limit for the VReg index to 8M (2^23). However I don't see how we can fit explicit stack slots into this scheme. To go even further, we would need to store auxillary struct OperandExtra(u8);
trait Function {
// Both slices must have the same length.
fn inst_operands(&self, insn: Inst) -> (&[Operand], &[OperandExtra]);
} |
Allowing a second The same approach could be used for As a separate thought: it could be worth measuring the runtime overhead of a fixed number of extra bits instead. It's certainly the preferable approach if cheap enough; split data structures will complexify everything else. IIRC, I did try a middle design point of 48 bits per Use -- |
Ah, I found it -- I did some measurements here and found a 3% overhead when using a 64-bit |
I ran a benchmark to estimate the overhead of 64-bit Could you give this a try with Cranelift? If you also see no measurable difference, then I will prepare a PR to increase the |
@Amanieu I'm a bit swamped at the moment but you might be able to run this yourself? I would suggest tackling the noise issue also, at least for this decision: we've had false results before where a 1% shift hides within noise, or appears out of nowhere (i.e., false positives and negatives), on systems where we didn't explicitly isolate a core, pin frequency, and steer interrupts away. @jameysharp wrote up a good doc here on what we do. You don't have to figure out Sightglass necessarily, I'd consider a I will say that I did see a shift of several percent when I tried this... 1.5 years ago now?... and so while a lot has changed since then and it could be "free-ish" now, the onus of extra-careful proof is on the proposed change here. One other experiment that would be useful, to evaluate the measurement setup, is to try to swing the needle further and see what the sensitivity is: add extra padding to |
Two core data-structure elements,
Operand
andUse
, are both designed to fit a relatively large amount of information in oneu32
. This is a performance optimization that we have found to be relatively impactful; expanding even to au64
has a measurable impact (of at least a few percent) on compilation time.Unfortunately, the scarcity of bits means that certain limits are lower than we would prefer. For example, we support only a 5-bit index for physical registers in each register class (so 32 integer registers and 32 float/vector registers), which may not be enough for some use-cases (though it can work for aarch64 and x64 at least). This also limits the VReg count to 1M (2^20).
We should investigate ways of, e.g., out-of-lining infrequently-used information (such as fixed-PReg constraints) to raise the limits on VRegs, PRegs, instruction count, and the like and provide enough headroom for any reasonably-imaginable use case.
The text was updated successfully, but these errors were encountered: