-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAMsys 2.0] Terapool DRAMsys Merge Request #93
Conversation
Hello Guys, this is the first draft for the DRAM work merge. Please check if everything is ok. I will keep update and fix your suggestions in next few days. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, Yichao and Chi. In general, it looks good to me, but I think we should discuss putting all the dram simulation stuff in its own repository and including it in systems that need it as a submodule. There are also some linting errors and missing newlines.
config/config.mk
Outdated
@@ -71,4 +66,4 @@ xpulpimg ?= 1 | |||
|
|||
# This parameter is only used for TeraPool configurations | |||
num_sub_groups_per_group ?= 1 | |||
remote_group_latency_cycles ?= 7 | |||
remote_group_latency_cycles ?= 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
config/mempool.mk
Outdated
l2_banks ?= 4 | ||
|
||
# DRAM channel interleaving | ||
dram_axi_width_interleaved ?= 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are also missing a newline.
@@ -21,7 +21,17 @@ num_cores_per_tile ?= 4 | |||
banking_factor ?= 4 | |||
|
|||
# Radix for hierarchical AXI interconnect | |||
axi_hier_radix ?= 20 | |||
axi_hier_radix ?= 17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, and this change should not impact the backend too much, so if it actually helps with the DRAM's performance, we can keep it that way. But it would be interesting to know why we can't get full performance with smaller bursts on the DRAM.
# Radix for hierarchical AXI interconnect | ||
axi_hier_radix ?= 2 | ||
|
||
# Number of AXI masters per group | ||
axi_masters_per_group ?= 1 | ||
|
||
# Number of DMA backends in each group | ||
dmas_per_group ?= 1 # Brust Length = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MinPool, the burst length will be 8, right?
hardware/dram_rtl_sim/Bender.yml
Outdated
# Licensed under the Apache License, Version 2.0, see LICENSE for details. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
package: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like dram_rtl_sim
is a full dependency. I think it would make more sense to create a dedicated repository for it that can be included as a dependency instead of flattening it in MemPool. Especially since it's almost 200 files and more than 2 million lines of code. 😲 Is it also used in other projects or just here?
hardware/src/mempool_group.sv
Outdated
.AxReqFifoDepth (8 ), | ||
.TransFifoDepth (4 ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those changes really necessary to get the best performance? Those FIFOs are very expensive in hardware.
hardware/src/mempool_sub_group.sv
Outdated
.AxReqFifoDepth (8 ), | ||
.TransFifoDepth (4 ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here.
hardware/tb/mempool_tb.sv
Outdated
@@ -30,7 +30,7 @@ module mempool_tb; | |||
localparam BootAddr = 0; | |||
`endif | |||
|
|||
localparam ClockPeriod = 2ns; | |||
localparam ClockPeriod = 1ns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
hardware/tb/mempool_tb.sv
Outdated
for (genvar bank = 0; bank < NumDrams; bank++) begin : gen_drams_init | ||
initial begin : l2_init | ||
automatic logic [L2BankWidth-1:0] mem_row; | ||
byte buffer []; | ||
addr_t address; | ||
addr_t length; | ||
string binary; | ||
|
||
// Initialize memories | ||
void'($value$plusargs("PRELOAD=%s", binary)); | ||
if (binary != "") begin | ||
// Read ELF | ||
read_elf(binary); | ||
$display("Loading %s", binary); | ||
while (get_section(address, length)) begin | ||
// Read sections | ||
automatic int nwords = (length + L2DramBeWidth - 1)/L2DramBeWidth; | ||
$display("Loading section %x of length %x", address, length); | ||
buffer = new[nwords * L2DramBeWidth]; | ||
void'(read_section(address, buffer)); | ||
if (address >= dut.L2MemoryBaseAddr) begin | ||
for (int i = 0; i < nwords * L2DramBeWidth; i++) begin //per byte | ||
automatic dram_ctrl_interleave_t dram_ctrl_info; | ||
dram_ctrl_info = getDramCTRLInfo(address + i - dut.L2MemoryBaseAddr); | ||
if (dram_ctrl_info.dram_ctrl_id == bank) begin | ||
dut.gen_drams[bank].i_axi_dram_sim.i_sim_dram.load_to_dram(dram_ctrl_info.dram_ctrl_addr, buffer[i]); | ||
end | ||
end | ||
end else begin | ||
$display("Cannot initialize address %x, which doesn't fall into the L2 DRAM region.", address); | ||
end | ||
end | ||
end | ||
end : l2_init | ||
end : gen_drams_init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like most of this block is the same as for the L2. Could we merge them and have the ifdef only in the actual preloading in the innermost loops?
hardware/Makefile
Outdated
# Path to DRAMsyslib | ||
dramsys_resouces_path ?= $(MEMPOOL_DIR)/hardware/dram_rtl_sim/dramsys_lib/resources | ||
dramsys_simconfig_path ?= $(dramsys_resouces_path)/simulations/HBM2E-3600.json | ||
dramsys_lib_path ?= $(MEMPOOL_DIR)/hardware/dram_rtl_sim/dramsys_lib | ||
questa_args += +DRAMSYS_RES=$(dramsys_resouces_path) +DRAMSYS_CONF=$(dramsys_simconfig_path) | ||
questa_args += -sv_lib $(dramsys_lib_path)/libDRAMSysLibrary | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also work if dram is disabled, or should we only add those questa_args if dram
is defined?
…s to hide DRAM latency
…e open-sourced DRAMsys as a submodule
…igurations, and compiling dramsys dynamic library.
…le will patch to dram_sim_rtl submodule by makefile target
…better DRAM BW as the HBM2 support upto 3600Gbps DDR
…n on the verification
a1e6aa3
to
ef0c820
Compare
Summarize the changes concisely
Changelog
Added