Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rv_core_ibex, top_earlgrey] Enable SecureIbex for CW340 #25146

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nasahlpa
Copy link
Member

@nasahlpa nasahlpa commented Nov 14, 2024

As the SecureIbex configuration is the one we use for the Earl Grey ASIC, we also should enable it for the FPGA boards.

Due to resource constraints, this is only possible for the CW340 but not the CW310.

Report for CW340 with SecureIbex enabled:
Resource utilization stays still well below 70%:
image

And timing is met:
image

Report for CW340 with SecureIbex disabled:
image
image

Closes #25137

Furthermore, to test this feature, this PR also adds the CW340 exec. environment for the otbn_scramble_test to close #20119.

Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?

The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

% elif target["name"] == "cw310":
.KmacEnMasking(0),
.KmacSwKeyMasked(1),
.KeymgrKmacEnMasking(0),
.SecKmacCmdDelay(0),
.SecKmacIdleAcceptSwMsg(1'b0),
.RvCoreIbexSecureIbex(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably also should add that for the cw305.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, done.

@nasahlpa
Copy link
Member Author

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?

The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

About that:

I get the same error on a local CW340 with the new bitstream:
Command result: SFDP header contains incorrect signature: 0xffffffff
as in CI.
I am a bit stuck here. Do you know how I can debug this?

@nasahlpa
Copy link
Member Author

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?

The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

I've updated the PR description with area and timing numbers before and after this PR.

@vogelpi
Copy link
Contributor

vogelpi commented Nov 15, 2024

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?
The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

About that:

I get the same error on a local CW340 with the new bitstream: Command result: SFDP header contains incorrect signature: 0xffffffff as in CI. I am a bit stuck here. Do you know how I can debug this?

Thanks for adding the area and timing numbers. I think the change is reasonable for the big board. About the SFDP header issue, I have no idea abou this. Maybe someone from the software team knows?

@a-will
Copy link
Contributor

a-will commented Nov 15, 2024

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?
The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

About that:
I get the same error on a local CW340 with the new bitstream: Command result: SFDP header contains incorrect signature: 0xffffffff as in CI. I am a bit stuck here. Do you know how I can debug this?

Thanks for adding the area and timing numbers. I think the change is reasonable for the big board. About the SFDP header issue, I have no idea abou this. Maybe someone from the software team knows?

That likely means bootstrapping failed at the very first step, and the chip gave no response to SPI. MISO was left high when the tester tried to read the SFDP table.

For the prior PR state and its associated CI jobs, I found no evidence any software had run on CW340's Ibex. The ROM did not seem to print any identification messages.

@nasahlpa
Copy link
Member Author

Thanks @nasahlpa for looking into this. Do you know by how much the resource utilization increases roughtly?
The changes look mostly good, but there are FPGA CI failures and we should make sure all tests pass before merging this.

About that:
I get the same error on a local CW340 with the new bitstream: Command result: SFDP header contains incorrect signature: 0xffffffff as in CI. I am a bit stuck here. Do you know how I can debug this?

Thanks for adding the area and timing numbers. I think the change is reasonable for the big board. About the SFDP header issue, I have no idea abou this. Maybe someone from the software team knows?

That likely means bootstrapping failed at the very first step, and the chip gave no response to SPI. MISO was left high when the tester tried to read the SFDP table.

For the prior PR state and its associated CI jobs, I found no evidence any software had run on CW340's Ibex. The ROM did not seem to print any identification messages.

Thanks @a-will. Do you know how I can debug this?

@GregAC
Copy link
Contributor

GregAC commented Nov 15, 2024

@nasahlpa what test failed? Can't find the older CI logs and still awaiting the run for the newer one.

At a guess the bootloader has immediately fallen over because we get an instant alert from Ibex, maybe because the lockstep hasn't come out of reset properly? There's some prims involved that may have FPGA specific implementations that could explain the behaviour difference.

@nasahlpa
Copy link
Member Author

@nasahlpa what test failed? Can't find the older CI logs and still awaiting the run for the newer one.

At a guess the bootloader has immediately fallen over because we get an instant alert from Ibex, maybe because the lockstep hasn't come out of reset properly? There's some prims involved that may have FPGA specific implementations that could explain the behaviour difference.

All of them. Not a single test on CW340 passed. Here are the old results from CI.

@nasahlpa
Copy link
Member Author

nasahlpa commented Nov 17, 2024

I investigated this and discovered the following:

When enabling the SecureIbex paramter for CW340, a major Ibex alert is triggered by a register file ECC check. Hence, the cores does not boot up and we cannot bootstrap software.

When either setting the RegFileECC = 1'b0 in the ibex_top module or forcing the ECC error signal to 0, the core boots up and I can successfully run SW on CW340 with the SecureIbex parameter on.

My assumption is that the ibex_register_file_fpga module does not correctly work with RegFileECC = 1'b1. I had a quick look into the FPGA register file - it though seems that we are correctly initializing the RF content with the ECC encoded 0.

In the long term, we should investigate this and fix it.

As a temporary solution, we could bypass the error:

  • Disable the RegFileECC option.
    • Requires decoupling this option from the SecureIbex parameter, i.e., changes in the Ibex repository as well as in top_earlgrey as well as chip_earlgrey_cw340 in the OpenTitan repository are needed.
  • Use the FF register file.
    • When the SecureIbex parameter is enabled for CW340, use the FF instead of the FPGA register file
    • Probably the easiest solution for now as we only need to modify chip_earlgrey_cw340 in the OpenTitan repository.

I tested both solutions and they work.

WDYT @GregAC?

Edit: ah, after discussing with @vogelpi about this I think the error is here:
https://github.com/lowRISC/ibex/blob/169785d0711335c94561a93146e069766eec138c/rtl/ibex_register_file_fpga.sv#L46
We should use here WordZeroVal instead of 0. I'll give it a try and report back.

@GregAC
Copy link
Contributor

GregAC commented Nov 18, 2024

@nasahlpa nice work on diagnosing the issue. I'd favour just using the FF register file in FPGA, assuming the FPGA implementation still works.

The reason to use the FPGA RF is efficient resource utilization I don't think switching to the FF version will have a meaningful effect given the full earl grey design is huge.

We should work out why the FPGA version is broken in this instance. I'm happy to take a look at this.

And having written that just seen the edit! If that's the quick fix then great go for it, otherwise switch to FF version would be my vote.

nasahlpa added a commit to nasahlpa/ibex that referenced this pull request Nov 18, 2024
We should use `WordZeroVal` instead of `0` for reads from register `x0` in the
FPGA register file.

This bug was discovered when enabling the `RegFileECC` parameter. When this is
enabled, the core performs ECC checks, expecting that `WordZeroVal` is returned
for `x0`. Else, we get a major alert.

Fixes lowRISC/opentitan#25146

Signed-off-by: Pascal Nasahl <[email protected]>
@nasahlpa
Copy link
Member Author

The error was indeed the zero value for x0.
This will be fixed once #2224 in the Ibex repository is merged.

Thanks all for helping me debugging this issue.

github-merge-queue bot pushed a commit to lowRISC/ibex that referenced this pull request Nov 18, 2024
We should use `WordZeroVal` instead of `0` for reads from register `x0` in the
FPGA register file.

This bug was discovered when enabling the `RegFileECC` parameter. When this is
enabled, the core performs ECC checks, expecting that `WordZeroVal` is returned
for `x0`. Else, we get a major alert.

Fixes lowRISC/opentitan#25146

Signed-off-by: Pascal Nasahl <[email protected]>
@nasahlpa nasahlpa reopened this Nov 19, 2024
Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nasahlpa nasahlpa marked this pull request as ready for review November 19, 2024 09:11
@nasahlpa nasahlpa added the CherryPick:earlgrey_1.0.0 This PR should be cherry-picked to earlgrey_1.0.0 label Nov 19, 2024
@nasahlpa nasahlpa force-pushed the secure_ibex_cw340 branch 2 times, most recently from 270a89d to 4e32e45 Compare November 19, 2024 12:50
As the `SecureIbex` configuration is the one we use for the
Earl Grey ASIC, we also should enable it for the FPGA boards.

Due to resource constraints, this is only possible for the CW340
but not the CW310.

Resource utilization stays still will below 70%.

Closes lowRISC#25137

Signed-off-by: Pascal Nasahl <[email protected]>
localparam bit RegFileECC = SecureIbex;
localparam bit RegFileECC = 1'b0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be removed before merging this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CherryPick:earlgrey_1.0.0 This PR should be cherry-picked to earlgrey_1.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[rv_core_ibex] Enable SecureIbex config for FPGA [chip-test] chip_sw_otbn_mem_scramble
4 participants