Async memory DPI #976

CircuitCoder · 2025-02-18T17:12:45Z

No description provided.

Avimitin

Symlinks seems to be accidentally added, please remove them.

CircuitCoder · 2025-02-21T08:21:39Z

Symlinks seems to be accidentally added, please remove them.

Should be fixed, thanks

sequencer

I didn't take a deep dive into the emulator codes, leave it to @FanShupei.
need to take care of my some nitpicks:

It made me nervous of unify the DPI calls from different channels with one DPI call, please make sure the channel ID can be simply extendable(we may add more indexed/stride AXI ports in the future)

Some thing we need to take into another consideration(maybe in the future PR):

address region should be explicit configured, with a configuration file or with statically hardcode are both OK, SW engineers need --help to get the address region.
How does the dramsim affect our verification flow? We are using printf trace-based verification, maybe we should be able to swap the RTL with sail model in the future?
The performance is affected, we still wanna evaluate the theoretical performance with 0-latency like what we originally did, and we can evaluate the penalty of memory latency for uarch tuning.

sequencer · 2025-02-26T21:18:57Z

difftest/dpi_t1rocketemu/src/dpi.rs

@@ -374,11 +277,20 @@ unsafe extern "C" fn t1_cosim_init(

  let scope = SvScope::get_current().expect("failed to get scope in t1_cosim_init");

+  use std::io::Write;
+  let ds3_cfg = include_bytes!("dramsim3-config.ini");


pass string from SV plusargs.

cc @FanShupei

Done in 225ade2

difftest/dpi_t1rocketemu/src/dpi.rs

sequencer · 2025-02-26T21:20:33Z

difftest/dpi_t1rocketemu/src/dramsim3-config.ini

@@ -0,0 +1,68 @@
+# DDR4_8Gb_x8_3200 from DRAMsim3 upstream


How about placing this file in a better place, it should be a configuration. cc @Avimitin

The planned argument for setting DRAMsim3 settings has three values:

+dramsim3_cfg = yes: using the embedded setting, which is this one

+dramsim3_cfg = no: disable DRAMsim3, use trivial memory latency model

+dramsim3_cfg = : Use an external DRAMsim3 setting.

The embedded configuration here are just a quality-of-life feature, and makes CI easier to use.

difftest/dpi_t1rocketemu/src/drive.rs

sequencer · 2025-02-26T21:24:35Z

difftest/dpi_t1rocketemu/src/interconnect.rs

+pub const SRAM_BASE: u32 = 0x2000_0000;
+pub const SRAM_SIZE: u32 = 0xa000_0000;


How does the DDR region configured? We can statically config a region for DDR, and print this region with --help from the emulator.

Actually these two constants should be named RAM_{BASE,SIZE} because they also represent the DRAM address mapping. To reuse the ELF across different memory latency models, the memories have to have the same mapping.

Done in 5bc77dd

t1rocketemu/src/AXI4SlaveAgent.scala

t1rocketemu/src/TestBench.scala

sequencer · 2025-02-26T21:41:38Z

The commit msg may not be difftest, just emu is enough? cc @FanShupei

CircuitCoder · 2025-02-27T01:26:12Z

@sequencer Some of the question you mentioned are answered in the inline reviews. For other questions:

How does the dramsim affect our verification flow? We are using printf trace-based verification, maybe we should be able to swap the RTL with sail model in the future?

I don't think there is an inherent difference (w.r.t. offline simulation) between sail and spike, so I think it should also be applicable? This PR doesn't touch online difftest.

The performance is affected, we still wanna evaluate the theoretical performance with 0-latency like what we originally did, and we can evaluate the penalty of memory latency for uarch tuning.

iirc @FanShupei did some profiling, and it shows that the increase in simulation time mostly came from the increase in cycle count itself, not from the added performance penalty from using DRAMsim3. If a faster simulation time should be recovered, it can simply use the trivial memory model after we allow configurations of memory latency from plusargs.

Also, the fake cache is currently not implemented. Adding that should also help.

FanShupei · 2025-02-27T05:35:10Z

The commit msg may not be difftest, just emu is enough?
@sequencer I'm OK with both. We've used difftest to tag any rust code change under difftest folder for a long time. No bother to change here.

@CircuitCoder The RTL fix is merged into master, this PR needs rebasing. The following commits change nix dependenceis. If these changes are intended, please separate them into another PR.

[difftest] Bump dependencies
[dependencies] Bump dep

The patch set is large than usual, and reviewing may need some time. I plan to finish the review in one or two days. Stay tuned.

FanShupei · 2025-02-27T05:56:27Z

t1rocketemu/src/AXI4SlaveAgent.scala

+      // Invoke DPI at negedge
+      // NOTICE: this block CANNOT directly write any outside reg. Only write wires (e.g. here, only writes queue IO)
+      withClock(invClock) {


RawCLockedNonVoidFunctionCall pass clock explicitly. No need to withClock here.

And please separate changes on AXI4SlaveAgent (from changes on rust code) to their own commits , with [axi4] ... commit message , to make the history cleaner.

Done in 3bfe39a

FanShupei · 2025-02-27T07:05:05Z

difftest/dpi_t1rocketemu/src/interconnect.rs

 // Caller is reponsible to ensure the following conditions hold:
 //   addr.len > 0
 //   addr.len == data.len()
 //   addr.len == mask.len() (if mask present)
 // However, since the functions are safe,
 // even if contracts violate, implementions must not break memory safety,


These comments become outdated. Remove them or move them to AddrInfo/MemReqPayload

Done in 68d735c

FanShupei · 2025-02-27T07:29:19Z

difftest/dpi_t1rocketemu/src/interconnect.rs

+impl<M: MemoryModel + Send + Sync + 'static> Device for RegularMemory<M> {
+  fn req(&mut self, req: MemReq<'_>) -> bool {
+    // dbg!(&req);
+    let ident = MemIdent {
+      id: req.id,
+      req: req.addr,
+      is_write: req.payload.is_write(),
+    };
+    self.model.push(ident);
+
+    if let MemReqPayload::Write(data, mask) = req.payload {
+      self.execute_write(req.addr, data, mask);
+    }
+    true
+  }
+
+  fn resp(&mut self) -> Option<MemResp<'_>> {
+    let popped = self.model.pop()?;
+    // dbg!(&popped);
+
+    // Construct MemResp
+    let payload = if popped.is_write {
+      MemRespPayload::WriteAck
+    } else {


It seems to may reorder requests. W->W & W->R order is preserved. R->R reorder is harmless.

But it does allow reordering R->W to W->R. But we are modeling AXI, such order shall be handled by managers. Seems we are exploiting this behavior.

Is my understanding correct?

I might not correctly understand the concern here. If what you are asking is that "where lies the responsibility for ordering concurrent overlapping R&W requests", then yes, it's the manager. More concretely, it's the iterating order of incomplete_reads and incomplete_writes in src/drive.rs. The device handles the requests as-is.

My understanding is that this reordering is allowed by AXI. The DPI side only need to guarantee any consistent order, s.t. the simulation result is deterministic. I'm even thinking of switch back to HashMap with a consistent hasher to get a little bit more coverage on potential memory request orderings.

OK, I just want to confirm this. I have no concerns. I agree ensuring determinism is enough. I'm just a bit puzzled since no comments mention it. Comments like "It follows AXI ordering rules. No ordering between reads & writes are guaranteed by devices " with MemReq helps to clarify it.

In the future, I consider adding functions to detect R/W address overlaps from the same master, since it's very likely a bug of manager. This is not a blocker for this PR.

CircuitCoder · 2025-02-27T15:42:46Z

The commit msg may not be difftest, just emu is enough?
@sequencer I'm OK with both. We've used difftest to tag any rust code change under difftest folder for a long time. No bother to change here.

@CircuitCoder The RTL fix is merged into master, this PR needs rebasing. The following commits change nix dependenceis. If these changes are intended, please separate them into another PR.
* [difftest] Bump dependencies

* [dependencies] Bump dep
The patch set is large than usual, and reviewing may need some time. I plan to finish the review in one or two days. Stay tuned.

These two commits are splited into #981. This PR now depends on that one.

…rite to avoid non-determinism

already resolved

CircuitCoder requested a review from FanShupei February 18, 2025 17:12

CircuitCoder marked this pull request as draft February 18, 2025 17:12

CircuitCoder force-pushed the async-mem branch from 9a54032 to 734e190 Compare February 18, 2025 20:07

This comment was marked as outdated.

Sign in to view

CircuitCoder force-pushed the async-mem branch from 700a84f to b6a06e0 Compare February 20, 2025 13:02

CircuitCoder marked this pull request as ready for review February 20, 2025 13:03

CircuitCoder requested review from sequencer and Avimitin February 20, 2025 13:39

CircuitCoder force-pushed the async-mem branch from b6a06e0 to 159ea0a Compare February 20, 2025 13:44

Avimitin previously requested changes Feb 21, 2025

View reviewed changes

CircuitCoder force-pushed the async-mem branch from 869a8b2 to e4bd1b1 Compare February 21, 2025 08:21

CircuitCoder requested a review from Avimitin February 21, 2025 08:21

CircuitCoder force-pushed the async-mem branch from 67deecb to 67f4d1c Compare February 21, 2025 08:43

CircuitCoder changed the title ~~WIP: Async memory DPI~~ Async memory DPI Feb 21, 2025

CircuitCoder force-pushed the async-mem branch 4 times, most recently from 5b5a753 to c1aa256 Compare February 26, 2025 10:15

sequencer reviewed Feb 26, 2025

View reviewed changes

sequencer force-pushed the async-mem branch from 9872227 to 5e27855 Compare February 26, 2025 21:44

CircuitCoder requested a review from sequencer February 27, 2025 01:28

FanShupei requested changes Feb 27, 2025

View reviewed changes

[dependencies] Bump dependencies

d1aac8e

CircuitCoder mentioned this pull request Feb 27, 2025

[dependencies] Bump dependencies #981

Closed

[difftest] Initial async memory

3ab6c6c

CircuitCoder added 3 commits February 27, 2025 23:38

[difftest] Async mem: AW/W/AR interface, and duplicated inflight ID

65aa4c9

[difftest] Async mem: ordering DRAM requests

e6471b1

[difftest] Rust side of async memory DPI interface

13970bd

CircuitCoder and others added 14 commits February 27, 2025 23:48

[difftest] [axi4] RTL side of t1rocketemu async mem

f748353

[difftest] [axi4] Fixed DPI calling procedure

2c76426

[difftest] [axi4] Fixed DRAM response handling for consecutive R/W

7e8e73f

[difftest] cargo fmt dpi_t1rocketemu

824dad2

[difftest] [axi4] Fixed various async mem DPI bugs

3000b14

[difftest] [axi4] Format and change println!s to debug!s

1ff5d35

[difftest] Fixed wstrb handling in async memory write

02c0caf

[difftest] Fixing handling of reordering caused by DRAM

e00ecb5

[difftest] Using BTreeMap instead of HashMap inside incomplete_read/w…

5f901ef

…rite to avoid non-determinism

[difftest] Added some documents to async memory

36f057f

[ci] update t1 test case cycle data

b74708d

[axi4] Remove withClock blocks around DPI calls in t1rocketemu

3bfe39a

[difftest] Rename SRAM_* into RAM_* in dpi_t1rocketemu

5bc77dd

[difftest] Corrected the document about address alignment for mem reqs

68d735c

CircuitCoder force-pushed the async-mem branch 3 times, most recently from 8fbb983 to 6c58d1b Compare February 27, 2025 21:21

[difftest] [vsrc] Added plusargs for DRAMsim3 configurations

4866263

CircuitCoder force-pushed the async-mem branch from 6c58d1b to 4866263 Compare February 27, 2025 21:25

FanShupei approved these changes Feb 28, 2025

View reviewed changes

FanShupei merged commit 1d9aa63 into master Feb 28, 2025
250 checks passed

FanShupei deleted the async-mem branch February 28, 2025 02:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async memory DPI #976

Async memory DPI #976

CircuitCoder commented Feb 18, 2025

This comment was marked as outdated.

Avimitin left a comment

CircuitCoder commented Feb 21, 2025

sequencer left a comment

sequencer Feb 26, 2025

sequencer Feb 26, 2025

CircuitCoder Feb 27, 2025

sequencer Feb 26, 2025

CircuitCoder Feb 27, 2025

sequencer Feb 26, 2025

CircuitCoder Feb 27, 2025

CircuitCoder Feb 27, 2025

sequencer commented Feb 26, 2025

CircuitCoder commented Feb 27, 2025

FanShupei commented Feb 27, 2025

FanShupei Feb 27, 2025

CircuitCoder Feb 27, 2025

FanShupei Feb 27, 2025

CircuitCoder Feb 27, 2025

FanShupei Feb 27, 2025

CircuitCoder Feb 27, 2025

FanShupei Feb 27, 2025

CircuitCoder commented Feb 27, 2025

		pub const SRAM_BASE: u32 = 0x2000_0000;
		pub const SRAM_SIZE: u32 = 0xa000_0000;

Async memory DPI #976

Async memory DPI #976

Conversation

CircuitCoder commented Feb 18, 2025

This comment was marked as outdated.

Avimitin left a comment

Choose a reason for hiding this comment

CircuitCoder commented Feb 21, 2025

sequencer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sequencer commented Feb 26, 2025

CircuitCoder commented Feb 27, 2025

FanShupei commented Feb 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CircuitCoder commented Feb 27, 2025