-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async memory DPI #976
Async memory DPI #976
Conversation
9a54032
to
734e190
Compare
This comment was marked as outdated.
This comment was marked as outdated.
700a84f
to
b6a06e0
Compare
b6a06e0
to
159ea0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symlinks seems to be accidentally added, please remove them.
869a8b2
to
e4bd1b1
Compare
Should be fixed, thanks |
67deecb
to
67f4d1c
Compare
5b5a753
to
c1aa256
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't take a deep dive into the emulator codes, leave it to @FanShupei.
need to take care of my some nitpicks:
- It made me nervous of unify the DPI calls from different channels with one DPI call, please make sure the channel ID can be simply extendable(we may add more indexed/stride AXI ports in the future)
Some thing we need to take into another consideration(maybe in the future PR):
- address region should be explicit configured, with a configuration file or with statically hardcode are both OK, SW engineers need
--help
to get the address region. - How does the dramsim affect our verification flow? We are using printf trace-based verification, maybe we should be able to swap the RTL with sail model in the future?
- The performance is affected, we still wanna evaluate the theoretical performance with 0-latency like what we originally did, and we can evaluate the penalty of memory latency for uarch tuning.
difftest/dpi_t1rocketemu/src/dpi.rs
Outdated
@@ -374,11 +277,20 @@ unsafe extern "C" fn t1_cosim_init( | |||
|
|||
let scope = SvScope::get_current().expect("failed to get scope in t1_cosim_init"); | |||
|
|||
use std::io::Write; | |||
let ds3_cfg = include_bytes!("dramsim3-config.ini"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass string from SV plusargs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @FanShupei
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 225ade2
@@ -0,0 +1,68 @@ | |||
# DDR4_8Gb_x8_3200 from DRAMsim3 upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about placing this file in a better place, it should be a configuration. cc @Avimitin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The planned argument for setting DRAMsim3 settings has three values:
- +dramsim3_cfg = yes: using the embedded setting, which is this one
- +dramsim3_cfg = no: disable DRAMsim3, use trivial memory latency model
- +dramsim3_cfg = : Use an external DRAMsim3 setting.
The embedded configuration here are just a quality-of-life feature, and makes CI easier to use.
pub const SRAM_BASE: u32 = 0x2000_0000; | ||
pub const SRAM_SIZE: u32 = 0xa000_0000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the DDR region configured? We can statically config a region for DDR, and print this region with --help
from the emulator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually these two constants should be named RAM_{BASE,SIZE}
because they also represent the DRAM address mapping. To reuse the ELF across different memory latency models, the memories have to have the same mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 5bc77dd
The commit msg may not be difftest, just emu is enough? cc @FanShupei |
@sequencer Some of the question you mentioned are answered in the inline reviews. For other questions:
I don't think there is an inherent difference (w.r.t. offline simulation) between sail and spike, so I think it should also be applicable? This PR doesn't touch online difftest.
iirc @FanShupei did some profiling, and it shows that the increase in simulation time mostly came from the increase in cycle count itself, not from the added performance penalty from using DRAMsim3. If a faster simulation time should be recovered, it can simply use the trivial memory model after we allow configurations of memory latency from plusargs. Also, the fake cache is currently not implemented. Adding that should also help. |
@CircuitCoder The RTL fix is merged into master, this PR needs rebasing. The following commits change nix dependenceis. If these changes are intended, please separate them into another PR.
The patch set is large than usual, and reviewing may need some time. I plan to finish the review in one or two days. Stay tuned. |
t1rocketemu/src/AXI4SlaveAgent.scala
Outdated
// Invoke DPI at negedge | ||
// NOTICE: this block CANNOT directly write any outside reg. Only write wires (e.g. here, only writes queue IO) | ||
withClock(invClock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RawCLockedNonVoidFunctionCall
pass clock explicitly. No need to withClock
here.
And please separate changes on AXI4SlaveAgent (from changes on rust code) to their own commits , with [axi4] ...
commit message , to make the history cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3bfe39a
// Caller is reponsible to ensure the following conditions hold: | ||
// addr.len > 0 | ||
// addr.len == data.len() | ||
// addr.len == mask.len() (if mask present) | ||
// However, since the functions are safe, | ||
// even if contracts violate, implementions must not break memory safety, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments become outdated. Remove them or move them to AddrInfo
/MemReqPayload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 68d735c
impl<M: MemoryModel + Send + Sync + 'static> Device for RegularMemory<M> { | ||
fn req(&mut self, req: MemReq<'_>) -> bool { | ||
// dbg!(&req); | ||
let ident = MemIdent { | ||
id: req.id, | ||
req: req.addr, | ||
is_write: req.payload.is_write(), | ||
}; | ||
self.model.push(ident); | ||
|
||
if let MemReqPayload::Write(data, mask) = req.payload { | ||
self.execute_write(req.addr, data, mask); | ||
} | ||
true | ||
} | ||
|
||
fn resp(&mut self) -> Option<MemResp<'_>> { | ||
let popped = self.model.pop()?; | ||
// dbg!(&popped); | ||
|
||
// Construct MemResp | ||
let payload = if popped.is_write { | ||
MemRespPayload::WriteAck | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to may reorder requests. W->W & W->R order is preserved. R->R reorder is harmless.
But it does allow reordering R->W to W->R. But we are modeling AXI, such order shall be handled by managers. Seems we are exploiting this behavior.
Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might not correctly understand the concern here. If what you are asking is that "where lies the responsibility for ordering concurrent overlapping R&W requests", then yes, it's the manager. More concretely, it's the iterating order of incomplete_reads
and incomplete_writes
in src/drive.rs
. The device handles the requests as-is.
My understanding is that this reordering is allowed by AXI. The DPI side only need to guarantee any consistent order, s.t. the simulation result is deterministic. I'm even thinking of switch back to HashMap with a consistent hasher to get a little bit more coverage on potential memory request orderings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I just want to confirm this. I have no concerns. I agree ensuring determinism is enough. I'm just a bit puzzled since no comments mention it. Comments like "It follows AXI ordering rules. No ordering between reads & writes are guaranteed by devices " with MemReq
helps to clarify it.
In the future, I consider adding functions to detect R/W address overlaps from the same master, since it's very likely a bug of manager. This is not a blocker for this PR.
These two commits are splited into #981. This PR now depends on that one. |
…rite to avoid non-determinism
8fbb983
to
6c58d1b
Compare
6c58d1b
to
4866263
Compare
No description provided.