Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SHA256 SIMD intrinsics on x86 #3752

Merged
merged 1 commit into from
Aug 20, 2024
Merged

Conversation

Kixunil
Copy link
Contributor

@Kixunil Kixunil commented Jul 17, 2024

Disclaimer: this is my first contribution to miri's code. It's quite possible I'm missing something. This code works but may not be the cleanest/best possible.

It'd be useful to be able to verify code implementing SHA256 using SIMD since such code is a bit more complicated and at some points requires use of pointers. Until now miri didn't support x86 SHA256 intrinsics. This commit implements them.

@Kixunil
Copy link
Contributor Author

Kixunil commented Jul 17, 2024

How do I properly convey that I'm sure side effects don't happen? Just allow the lint?

@RalfJung
Copy link
Member

Thanks for the PR! There are some other PRs we have to review before we get to this one, so unfortunately it may take a bit -- sorry for that.

From a quick glance, it seems like this PR actually implements sha256 in Miri. Ideally I'd like to avoid being responsible for cryptographic code. Is there a small crate, akin to aes, that has a suitable license for us to be able to use it here as a sha256 implementation?

If we do carry a sha256 implementation, I would prefer one that was not written by ChatGPT, but by a human that actually understands sha256. Maybe there's a larger Rust crypto crate that we can copy an implementation from.

@apoelstra
Copy link
Contributor

@RalfJung there is the bitcoin_hashes crate which you're welcome to steal code from (it is CC0 licensed) which is written by professional human cryptographers. You probably don't want to depend on it because it does a bunch of other stuff (and you'll get complaints about having bitcoin in a dependency name).

There is also the sha2 crate written by a different set of professional human cryptographers, which is MIT/Apache licensed. And you could use it as a direct dependency (there is a good chance it is already in your dep tree somewhere) since it's narrowly focused and maintained. Though again, it does more than SHA256 which is really the only thing you want.

@apoelstra
Copy link
Contributor

I'll also add, that while your aversion to carrying/maintaining crypto code is well-placed, hashes are a low-burden thing to maintain for a few reasons. By their nature you don't need to worry about constant-timeness so you can write them once and forget about the code forever (especially in something like Miri where you aren't going to want to go microoptimizing this). And also by their nature it's super hard to have subtle correctness bugs; pretty-much any bug will result in every single output being visibly totally wrong. Finally hash functions are pretty small self-contained pure algorithms.

@RalfJung
Copy link
Member

Thanks, those are helpful comments!

@Kixunil
Copy link
Contributor Author

Kixunil commented Jul 18, 2024

There are some other PRs we have to review before we get to this one, so unfortunately it may take a bit -- sorry for that.

No problem! There's no rush, we can run my fork for now.

it seems like this PR actually implements sha256 in Miri

Not really, just the three primitives that need to be used together with a bunch of other code to actually implement sha256. I also suspect it's not trivial to just pull out these from existing impls - at least I tried and the code looked so different that I couldn't do it. That's why I gave up and translated the Intel's psueudocode which literally describe how the specific instructions work.

Ideally I'd like to avoid being responsible for cryptographic code.

Isn't miri only supposed to be used for testing (or rather sanitizing) the code? I don't believe anyone would ever run this in production where it would actually matter. The SHA256 tests that take under a second to run in debug mode run several minutes under miri. Further we can add a big fat warning about this.

Anyway, everything @apoelstra said is true. The code is constant-time and any single tiny mistake would just completely break it - as I experienced this first hand. It also happens to make debugging the code incredibly hard. (I had to put dbg! statements both into miri and into the tested code to figure out what was wrong.)

For reference this is what the actual sha256 implementation using SIMD looks like: https://github.com/rust-bitcoin/rust-bitcoin/blob/8804fa63b49dc5ee639e768b73650b46a5e0a8ab/hashes/src/sha256.rs#L494-L722

@RalfJung
Copy link
Member

Not really, just the three primitives that need to be used together with a bunch of other code to actually implement sha256. I also suspect it's not trivial to just pull out these from existing impls - at least I tried and the code looked so different that I couldn't do it. That's why I gave up and translated the Intel's psueudocode which literally describe how the specific instructions work.

So the sha2 crate wouldn't be useful? That seems like the closest thing.
Cc @newpavlov -- could I ask for your help with this PR?

Isn't miri only supposed to be used for testing (or rather sanitizing) the code? I don't believe anyone would ever run this in production where it would actually matter. The SHA256 tests that take under a second to run in debug mode run several minutes under miri. Further we can add a big fat warning about this.

Yes, but that doesn't change the fact that Miri has a small set of maintainers, and we can't just ship code we all have no clue about -- that's not living up to our responsibility as maintainers. It's not about performance, it's about the fact that I can't check whether your code makes any sense and I don't have the time to become an expert in every problem domain that is covered by an Intel intrinsic.

"Code was translated by Chat GPT" also raises some very big alarm bells, since it seems to indicate that you also didn't check whether the code makes any sense. At the very least I would expect a link to the Intel docs, and a statement by you as PR author that you are sure that the Chat GPT translation is correct. Someone from the Miri team will have to double-check this before we can land this (if we can't find an alternative to carrying this code ourselves), but it's not reasonable to ask us to be the first humans to check this.

And of course the PR needs to come with some tests.

src/shims/x86/sha.rs Outdated Show resolved Hide resolved
src/shims/x86/sha.rs Outdated Show resolved Hide resolved
@newpavlov
Copy link
Contributor

newpavlov commented Jul 22, 2024

We probably could handle emulation of cryptographic intrinsics on the RustCrypto side. For example, we have a similar request for AES: RustCrypto/block-ciphers#389

But, personally, I would prefer to do it in separate crates, since having an emulation of SHA2 hardware intrinsics in the sha2 crate is not worthwhile in my opinion.

@tarcieri WDYT?

@Kixunil
Copy link
Contributor Author

Kixunil commented Jul 22, 2024

So the sha2 crate wouldn't be useful?

Looking at its code it does actually have the same primitives inside so copying them would be arguably better but they still need a bit of wiring code - the same there is at the beginning that just shuffles values in arrays.

I can't check whether your code makes any sense and I don't have the time to become an expert in every problem domain that is covered by an Intel intrinsic.

OK, fair. How can we solve it then? The code should exactly (bug-for-bug) emulate the Intel's intrinsic and the intrinsic has documented pseudo code implementation should just comparing the implementation be enough? Or is there anything else I can do?

"Code was translated by Chat GPT" also raises some very big alarm bells, since it seems to indicate that you also didn't check whether the code makes any sense.

I did test it by running bitcoin_hashes test suite in miri which passed perfectly and due to the property of hashes that any tiny mistake causes all the tests to reliably fail I'm confident it has the same behavior as the real instructions. The only thing I didn't yet check is if there are performance optimizations left. I thought this is not urgent.

If this is not sufficient please suggest specific steps I could do and I'll do it.

but it's not reasonable to ask us to be the first humans to check this.

That's absolutely something I wouldn't make you do, I'm sorry if it looked like I did. That's what I meant by "this code works". The disclaimer is about the style of the code. The things I'm unsure about is the miri part - does the way I do loads and stores make sense? I have no clue about miri internals, I just looked how other intrinsics do it and copied it. This is something I need guidance in.

And of course the PR needs to come with some tests.

Of course, I first wanted to see if my approach makes sense so that I wouldn't need to refactor the test (running SHA256 is the ultimate test). I'm also not sure what kind of testing strategy would be most appropriate. I kinda wish to just run the real instructions on real hardware with random inputs and compare the results but I don't know if this is a viable solution. Or just bake in the real SHA256 and run it on a few inputs. Of course I can also grab a bunch of random vectors and bake them in.

@RalfJung
Copy link
Member

Looking at its code it does actually have the same primitives inside so copying them would be arguably better but they still need a bit of wiring code - the same there is at the beginning that just shuffles values in arrays.

Sounds like a good idea.

OK, fair. How can we solve it then? The code should exactly (bug-for-bug) emulate the Intel's intrinsic and the intrinsic has documented pseudo code implementation should just comparing the implementation be enough? Or is there anything else I can do?

Manually comparing Intel's pseudocode with the Miri code would be a way to get additional confidence on top of the testing.

The things I'm unsure about is the miri part - does the way I do loads and stores make sense? I have no clue about miri internals, I just looked how other intrinsics do it and copied it. This is something I need guidance in.

We don't usually load things into arrays on the host side, we just process them element-by-element, but here that doesn't seem to make sense so what you did looks reasonable.

Of course, I first wanted to see if my approach makes sense so that I wouldn't need to refactor the test (running SHA256 is the ultimate test). I'm also not sure what kind of testing strategy would be most appropriate. I kinda wish to just run the real instructions on real hardware with random inputs and compare the results but I don't know if this is a viable solution. Or just bake in the real SHA256 and run it on a few inputs. Of course I can also grab a bunch of random vectors and bake them in.

We should definitely have some test vectors directly checking the intrinsic, yes. If there are any corner cases, they should be covered here.

How big is a (not performance-oriented) sha256 implementation on top of these intrinsics? If that's small enough to put in a test, then we can also have that in addition.

src/shims/x86/sha.rs Outdated Show resolved Hide resolved
@Kixunil
Copy link
Contributor Author

Kixunil commented Jul 22, 2024

OK, I'll look into it later, I have currently some urgent things to do.

@RalfJung
Copy link
Member

@rustbot author
based on the last comment

Regarding the decision about how to implement these intrinsics; @rust-lang/miri see the discussion above -- what do you think?

@rustbot rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label Aug 17, 2024
@Kixunil Kixunil force-pushed the simd-sha256 branch 3 times, most recently from de228a3 to 9f737d0 Compare August 17, 2024 15:18
@Kixunil
Copy link
Contributor Author

Kixunil commented Aug 17, 2024

@rustbot ready

Funny how you've looked at it at around the same time I did. :)

I've pushed an updated version that copies stuff from RustCrypto. I have addressed all your review comments including adding tests. The tests are somewhat large so LMK if I should remove anything.

@rustbot rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Aug 17, 2024
@RalfJung
Copy link
Member

Thanks!

Regarding the tests, it seems I should have been more clear -- what I meant is tests in tests/pass that invoke the intrinsics from a Rust program that is interpreted by Miri. We want to know that the intrinsic actually works, after all. :) Unit tests inside src are not really necessary.

@Kixunil
Copy link
Contributor Author

Kixunil commented Aug 17, 2024

I felt like something is off... Anyway, it's moved now.

@saethlin
Copy link
Member

@rust-lang/miri see the discussion above -- what do you think?

I am similarly opposed to checking in any code that comes from ChatGPT. In addition to the issues with its accuracy and our ability to read the code we are checking in, it seems like the project has no regard for the licensing of its training set.

As to reviewing the implementation, I also cannot vouch for its correctness. I might be able to, given some time to learn SHA256.

That being said, I think there's a pretty low chance that the implementation here is subtly broken if it passes a few tests. The implementation has no branches and I don't see any edge cases, so just a handful of test cases should provide full coverage.

@Kixunil
Copy link
Contributor Author

Kixunil commented Aug 20, 2024

I am similarly opposed to checking in any code that comes from ChatGPT.

This is no longer relevant, the implementation now copies sha2 crate.

As to reviewing the implementation, I also cannot vouch for its correctness.

Since it's a copy of sha2 we can just rely on them.

That being said, I think there's a pretty low chance that the implementation here is subtly broken if it passes a few tests.

It's pretty much impossible to write an incorrect sha256 since any mistake will produce a completely different output. It's the same property as for the input.

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just got some nits. :)

src/shims/x86/sha.rs Outdated Show resolved Hide resolved
add(v0, sigma0x4(sha256load(v0, v1)))
}

fn sha256msg2_epu32(v4: [u32; 4], v3: [u32; 4]) -> [u32; 4] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to sha256msg2.

src/shims/x86/sha.rs Outdated Show resolved Hide resolved
tests/pass/shims/x86/intrinsics-sha.rs Outdated Show resolved Hide resolved
It'd be useful to be able to verify code implementing SHA256 using SIMD
since such code is a bit more complicated and at some points requires
use of pointers. Until now `miri` didn't support x86 SHA256 intrinsics.
This commit implements them.
@Kixunil
Copy link
Contributor Author

Kixunil commented Aug 20, 2024

Addressed the nits.

@RalfJung
Copy link
Member

Looks good, thanks. :)

@bors r+

@bors
Copy link
Contributor

bors commented Aug 20, 2024

📌 Commit 7949e92 has been approved by RalfJung

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Aug 20, 2024

⌛ Testing commit 7949e92 with merge dcd2112...

@bors
Copy link
Contributor

bors commented Aug 20, 2024

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing dcd2112 to master...

@bors bors merged commit dcd2112 into rust-lang:master Aug 20, 2024
8 checks passed
@Kixunil Kixunil deleted the simd-sha256 branch August 20, 2024 14:19
@Kixunil
Copy link
Contributor Author

Kixunil commented Aug 20, 2024

I can't find any information about the release schedule. Will this be available in the next nightly or will I have to wait longer?

@RalfJung
Copy link
Member

It will be available once Miri get synced into the rustc repo. I usually do this each weekend, though this weekend I am traveling so it might happen later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Waiting for a review to complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants