Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune the compiled code to its minimum expression #639

Draft
wants to merge 34 commits into
base: dev_branch
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
66c7153
fix interleaved bus (#620)
davideschiavone Jan 29, 2025
96d5252
DMA cast support and new HW FIFO interface (#586)
LuigiGiuffrida98 Feb 19, 2025
e84ebcf
Added script to visualize memory usage after compilation (#636)
JuanSapriza Feb 21, 2025
9b7a170
Added minimal sw app. Also added --gc-sections to compiler flags to r…
JuanSapriza Feb 5, 2025
8fd461b
TEMPORARY SOLUTION: comment interrupt routines calling the weak imple…
JuanSapriza Feb 6, 2025
0ca09c0
Removed the parsing of the map file. Now parsing the output of readelf
JuanSapriza Feb 6, 2025
d048f61
Catch problems with flash_exec
JuanSapriza Feb 6, 2025
ca61762
added readelf to the CI
JuanSapriza Feb 6, 2025
a233731
added apt insall binutils
JuanSapriza Feb 6, 2025
9412377
Simply quit the script if readelf not available
JuanSapriza Feb 6, 2025
9f42062
removed all version
JuanSapriza Feb 6, 2025
04b0fad
Restored handlers
JuanSapriza Feb 6, 2025
885bd94
corrected flash load
JuanSapriza Feb 6, 2025
a39f78c
Added minimal configuration + changed the lower limit to 1 memory bank
JuanSapriza Feb 7, 2025
ae1d3fb
Replaced use of stdio for syscalls and exception printing
JuanSapriza Feb 7, 2025
d71b814
exposed _write to be used in handler. Also changed types. This might …
JuanSapriza Feb 7, 2025
9a94377
removed inclsuion of string.h
JuanSapriza Feb 7, 2025
a3fb02c
Removed the need for 64-bit division in the UART + now the NCO is not…
JuanSapriza Feb 7, 2025
ef2e3b3
restored -lgcc as it is not needed to remove it
JuanSapriza Feb 7, 2025
98ac4b1
Fixed differences in sw
JuanSapriza Feb 21, 2025
53fe69c
Restored the plic handlers that were crashing some apps
JuanSapriza Feb 21, 2025
ad4fb70
Added notes on how to minimize code size
JuanSapriza Feb 21, 2025
0abcfa9
Removed use of puts function in vector.S which was icnreasing code si…
JuanSapriza Feb 21, 2025
d2099da
Forced the inclusion of functions from syscalls that were being repla…
JuanSapriza Feb 21, 2025
e71924c
Sacrificed 0.1 kB to have a writestr function in syscalls to use inst…
JuanSapriza Feb 21, 2025
ee522f6
Fixed the _writestr definition
JuanSapriza Feb 21, 2025
a34e7ba
Replaced assert and handler functions that were using other syscalls …
JuanSapriza Feb 21, 2025
0e13225
added inclusions and declarations for the sake of cpp
JuanSapriza Feb 21, 2025
abedd5c
Corrected rebase
JuanSapriza Feb 24, 2025
ac5626d
included syscalls.h in main.c for compatibility with OHW compiler
JuanSapriza Feb 25, 2025
e753eed
Removed unnecessary jumps in vector.S
JuanSapriza Feb 25, 2025
53b3f91
Added message on ecall. Adds ~100 B
JuanSapriza Feb 25, 2025
82b15ce
one less file to diff
JuanSapriza Feb 25, 2025
e3da38c
Added ifdef to prevent hardcoding NCO and use the 64-bit division if …
JuanSapriza Feb 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/sim-apps-job/test_apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ class BColors:
"example_spi_read",
"example_spidma_powergate",
"example_spi_write",
"example_dma_subaddressing",
]

app_list = [app for app in os.listdir("sw/applications")]
Expand Down
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,10 @@ environment.yml: python-requirements.txt
## Generates mcu files core-v-mini-mcu files and build the design with fusesoc
## @param CPU=[cv32e20(default),cv32e40p,cv32e40x,cv32e40px]
## @param BUS=[onetoM(default),NtoM]
## @param MEMORY_BANKS=[2(default) to (16 - MEMORY_BANKS_IL)]
## @param MEMORY_BANKS=[2(default)to(16-MEMORY_BANKS_IL)]
## @param MEMORY_BANKS_IL=[0(default),2,4,8]
## @param X_HEEP_CFG=[configs/general.hjson(default),<path-to-config-file> ]
## @param MCU_CFG_PERIPHERALS=[mcu_cfg.hjson(default),<path-to-config-file>]
mcu-gen:
$(PYTHON) util/mcu_gen.py --config $(X_HEEP_CFG) --cfg_peripherals $(MCU_CFG_PERIPHERALS) --pads_cfg $(PAD_CFG) --outdir hw/core-v-mini-mcu/include --cpu $(CPU) --bus $(BUS) --memorybanks $(MEMORY_BANKS) --memorybanks_il $(MEMORY_BANKS_IL) --external_domains $(EXTERNAL_DOMAINS) --external_pads $(EXT_PAD_CFG) --pkg-sv hw/core-v-mini-mcu/include/core_v_mini_mcu_pkg.sv.tpl
$(PYTHON) util/mcu_gen.py --config $(X_HEEP_CFG) --cfg_peripherals $(MCU_CFG_PERIPHERALS) --pads_cfg $(PAD_CFG) --outdir hw/core-v-mini-mcu/ --bus $(BUS) --memorybanks $(MEMORY_BANKS) --memorybanks_il $(MEMORY_BANKS_IL) --tpl-sv hw/core-v-mini-mcu/system_bus.sv.tpl
Expand Down Expand Up @@ -168,6 +170,7 @@ app: clean-app
echo "\033[0;31mI would start by checking b) if I were you!\033[0m"; \
exit 1; \
}
@python scripts/building/mem_usage.py

## Just list the different application names available
app-list:
Expand Down
24 changes: 24 additions & 0 deletions configs/minimal.hjson
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
ram_address: 0
bus_type: "onetoM",
ram_banks: {
code_and_data: {
num: 1
sizes: [32]
}
}

linker_sections:
[
{
name: code
start: 0
#minimum size for freeRTOS and clang
size: 0x000004000
},
{
name: data
start: 0x000004000
}
]
}
1 change: 1 addition & 0 deletions core-v-mini-mcu.core
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ targets:
vsim_options:
- -sv_lib ../../../hw/vendor/lowrisc_opentitan/hw/dv/dpi/uartdpi/uartdpi
- -sv_lib ../../../hw/vendor/pulp_platform_pulpissimo/rtl/tb/remote_bitbang/librbs
- -voptargs=+acc=npr
vcs:
vcs_options:
- -override_timescale=1ns/1ps
Expand Down
48 changes: 45 additions & 3 deletions docs/source/Peripherals/DMA.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,8 @@ The previous parameters, including the register offsets, can be found at `sw/dev
- 0: _linear mode_
- 1: _circular mode_
- 2: _address mode_
- 3: _subaddress mode_
- 4: _hardware fifo mode_

<hr>

Expand Down Expand Up @@ -560,7 +562,7 @@ If senseless configurations are input to functions, assertions may halt the whol

#### Transaction modes

There are three different transaction modes:
There are five different transaction modes:

**Single Mode:** The default mode, where the DMA channel will perform the copy from the source target to the destination, and trigger an interrupt once done.

Expand All @@ -569,6 +571,14 @@ There are three different transaction modes:
**Address Mode:** Instead of using the destination pointer and increment to decide where to copy information, an _address list_ must be provided, containing addresses for each data unit being copied. It is only carried out in _single_ mode.
In this mode it's possible to perform only 1D transactions.

**Subaddress Mode:** In this mode, the DMA can be configured to transfer words, half words or bytes from Flash to the destination target via the SPI slot. This mode is particularly useful as it allows the DMA to sequentially read the half words or bytes composing the word retrieved from Flash, and forward them to the appropriate location in the destination target. The key difference between Subaddress Mode and Single Mode in terms of SPI-Flash interaction lies in how data is handled. In Single Mode, when the destination data type is set to `Half-Word` or `Byte`, the DMA writes only the least significant half-word or byte from the word fetched via SPI. In contrast, Subaddress Mode ensures that each half-word or byte within the fetched word is considered and transferred correctly to the destination.

**Hardware Fifo Mode:** In this mode, the DMA fetches data from the source target and forwards it directly to an external accelerator tightly coupled with the DMA itself. The DMA exposes a dedicated interface composed of two ports, respectively of type `hw_fifo_req_t` and `hw_fifo_resp_t`. Using this interface, the DMA can interact with an external streaming accelerator through input/output FIFOs. Input data to the DMA bypass the DMA internal FIFOs, and they are directly forwarded to the accelerator, which is required to have two internal FIFOs. The first one, referred to as _hardware read fifo_, is filled with data coming from the source target through the `hw_fifo_req_t` port. The second, referred to as _hardware write fifo_, is used by the accelerator to store the results of its computation. Once data is written in the hardware read fifo, the accelerator is in charge of popping from it and processing the data. In the end, results must be pushed into the hardware write fifo. Subsequently, the DMA reads data from the hardware write fifo through the `hw_fifo_resp_t` port, and stores it into the destination target. A block diagram showing this DMA interface along with an external accelerator is shown in figure.

![hw fifo](/images/hw_fifo_mode.png)

<p align="center">Figure 2: External Streaming Accelerator tightly coupled with the DMA to be used in Hardware Fifo Mode </p>



#### Windows
Expand Down Expand Up @@ -747,8 +757,9 @@ Here is a brief overview of the examples:
6) Matrix zero padding
7) Multichannel mem2mem transaction, focusing on the IRQ handler
8) Multichannel flash2mem transaction using the SPI FLASH
9) Single-channel flash2mem read transactions with different data widths (bytes, half-words and words) using the SPI FLASH

The complete code for these examples can be found in `sw/applications/example_dma`, `sw/applications/example_dma_2d`, `sw/applications/example_dma_multichannel` and `sw/applications/example_dma_sdk`. These applications offer both verification and performance estimation modes, enabling users to verify the DMA and measure the application's execution time.
The complete code for these examples can be found in `sw/applications/example_dma`, `sw/applications/example_dma_2d`, `sw/applications/example_dma_multichannel`, `sw/applications/example_dma_sdk` and `sw/applications/example_dma_subaddressing`. These applications offer both verification and performance estimation modes, enabling users to verify the DMA and measure the application's execution time.

The user is strongly incouraged to look at these applications, as well as any other application that employs the DMA, to gain insight in practical examples of the use of this peripheral. Some aspects or specific usecases might in fact not be present in this guide and could be found in the applications.

Expand Down Expand Up @@ -1911,4 +1922,35 @@ int main()
}
}

```
```

### 9. Single-channel flash2mem read transactions with different data widths (bytes, half-words and words) using the SPI FLASH

The goal of this example is to exploit the DMA Subaddress Mode to transfer data from the SPI Flash to X-Heep internal memory in these configurations:
- Using the _SPI Host 1_ configured to read at standard speed.
- Using the _SPI Host 1_ configured to read at quad speed.
- Using the _SPI Flash_ configured to read at standard speed.

For each configuration, five data transfers are performed chaging source and destination targets data types in the following manner:
- Both source and destination data types are `Word`
- Source data type is `Word`, destination one is `Half-word` and data is signed-extended before being written in the destination.
- Source data type is `Word`, destination one is `Half-word` and no sign-extension is performed.
- Source data type is `Word`, destination one is `Byte` and data is signed-extended before being written in the destination.
- Source data type is `Word`, destination one is `Byte` and no sign-extension is performed.

> :warning: This example can be executed only on QuestaSim or FPGA targets with the appropriate compilation flags.

#### Data to be transfered and golden outputs
File `buffer.h` contains input data for the DMA transfers, i.e. `original_128B` and `flash_only_buffer`. It also contains golden outputs for all the previously mentioned test cases.

#### Test Functions
Three test functions have been implemented to perform tests using the DMA in Subaddress Mode along with both _SPI Host 1_, configured to read at standard and quad speed, and _SPI Flash_ configured to read at standard speed:

- `test_read_dma` configures the dma transfer source and destination targets, sets up the SPI host to read at standard speed, and launches the DMA transfer. The increment of the source target is set to 0, since the DMA must always read from the same location, while the increment of the destination is set to 1. As regards data types, source data type is always set to `Word`, while the destination one is changed each time `test_read_dma` is called. This allows testing the writing of `Word`, `Half-Words` and `Bytes` to the destination target.

- `test_read_quad_dma` has the same structure of `test_read_dma`, but it configures the SPI host to read at quad speed.

- `test_read_flash_only_dma` has the same structure of `test_read_dma`, but it uses the _SPI Flash_ host to read at standard speed from flash.

#### Result Comparison and Correctness Checks
The `check_result` function is used after each transfer to check if the destination target has been filled with the correct data. This is accomplished by comparing the destination target buffer with the related golden outputs contained in `buffer.h`
Binary file added docs/source/images/hw_fifo_mode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 5 additions & 0 deletions hw/core-v-mini-mcu/ao_peripheral_subsystem.sv
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,9 @@ module ao_peripheral_subsystem
output logic dma_done_intr_o,
output logic dma_window_intr_o,

output hw_fifo_pkg::hw_fifo_req_t [core_v_mini_mcu_pkg::DMA_CH_NUM-1:0] hw_fifo_req_o,
input hw_fifo_pkg::hw_fifo_resp_t [core_v_mini_mcu_pkg::DMA_CH_NUM-1:0] hw_fifo_resp_i,

// External PADs
output reg_req_t pad_req_o,
input reg_rsp_t pad_resp_i,
Expand Down Expand Up @@ -409,6 +412,8 @@ module ao_peripheral_subsystem
.dma_write_resp_i,
.dma_addr_req_o,
.dma_addr_resp_i,
.hw_fifo_req_o,
.hw_fifo_resp_i,
.global_trigger_slot_i(dma_global_trigger_slots),
.ext_trigger_slot_i(dma_ext_trigger_slots),
.ext_dma_stop_i(ext_dma_stop_i),
Expand Down
5 changes: 5 additions & 0 deletions hw/core-v-mini-mcu/core_v_mini_mcu.sv
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,9 @@ module core_v_mini_mcu
output obi_req_t [core_v_mini_mcu_pkg::DMA_NUM_MASTER_PORTS-1:0] ext_dma_addr_req_o,
input obi_resp_t [core_v_mini_mcu_pkg::DMA_NUM_MASTER_PORTS-1:0] ext_dma_addr_resp_i,

output hw_fifo_pkg::hw_fifo_req_t [core_v_mini_mcu_pkg::DMA_CH_NUM-1:0] hw_fifo_req_o,
input hw_fifo_pkg::hw_fifo_resp_t [core_v_mini_mcu_pkg::DMA_CH_NUM-1:0] hw_fifo_resp_i,

input logic [core_v_mini_mcu_pkg::DMA_CH_NUM-1:0] ext_dma_stop_i,

output reg_req_t ext_peripheral_slave_req_o,
Expand Down Expand Up @@ -665,6 +668,8 @@ module core_v_mini_mcu
.dma_addr_resp_i(dma_addr_resp),
.dma_done_intr_o(dma_done_intr),
.dma_window_intr_o(dma_window_intr),
.hw_fifo_req_o,
.hw_fifo_resp_i,
.spi_flash_intr_event_o(spi_flash_intr),
.pad_req_o,
.pad_resp_i,
Expand Down
5 changes: 5 additions & 0 deletions hw/core-v-mini-mcu/core_v_mini_mcu.sv.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,9 @@ ${pad.core_v_mini_mcu_interface}
output obi_req_t [core_v_mini_mcu_pkg::DMA_NUM_MASTER_PORTS-1:0] ext_dma_addr_req_o,
input obi_resp_t [core_v_mini_mcu_pkg::DMA_NUM_MASTER_PORTS-1:0] ext_dma_addr_resp_i,

output hw_fifo_pkg::hw_fifo_req_t [core_v_mini_mcu_pkg::DMA_CH_NUM-1:0] hw_fifo_req_o,
input hw_fifo_pkg::hw_fifo_resp_t [core_v_mini_mcu_pkg::DMA_CH_NUM-1:0] hw_fifo_resp_i,

input logic [core_v_mini_mcu_pkg::DMA_CH_NUM-1:0] ext_dma_stop_i,

output reg_req_t ext_peripheral_slave_req_o,
Expand Down Expand Up @@ -413,6 +416,8 @@ ${pad.core_v_mini_mcu_interface}
.dma_addr_resp_i(dma_addr_resp),
.dma_done_intr_o(dma_done_intr),
.dma_window_intr_o(dma_window_intr),
.hw_fifo_req_o,
.hw_fifo_resp_i,
.spi_flash_intr_event_o(spi_flash_intr),
.pad_req_o,
.pad_resp_i,
Expand Down
1 change: 1 addition & 0 deletions hw/core-v-mini-mcu/core_v_mini_mcu.vlt
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,4 @@ lint_off -rule WIDTH -file "*/ao_peripheral_subsystem.sv" -match "Input port con
lint_off -rule UNDRIVEN -file "*ip/power_manager/rtl/power_manager.sv" -match "Signal is not driven: 'external_ram_banks_set_retentive*'"
lint_off -rule UNDRIVEN -file "*ip/power_manager/rtl/power_manager.sv" -match "Signal is not driven: 'external_subsystem_clkgate_en*'"
lint_off -rule UNUSED -file "*vendor/pulp_platform_register_interface/src/reg_mux.sv" -match "*"
lint_off -rule WIDTH -file "*/system_xbar.sv" -match "Operator ADD expects*"
25 changes: 25 additions & 0 deletions hw/core-v-mini-mcu/include/hw_fifo_pkg.sv
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
// Copyright 2025 EPFL and Politecnico di Torino.
// Solderpad Hardware License, Version 2.1, see LICENSE.md for details.
// SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1
//
// File: hw_fifo_pkg.sv
// Author: Alessio Naclerio
// Date: 17/02/2025
// Description: Package for HW FIFO MODE dma interface.

package hw_fifo_pkg;

typedef struct packed {
logic pop;
logic push;
logic [31:0] data;
} hw_fifo_req_t;

typedef struct packed {
logic empty;
logic full;
logic push;
logic [31:0] data;
} hw_fifo_resp_t;

endpackage
1 change: 1 addition & 0 deletions hw/core-v-mini-mcu/include/x-heep_packages.core
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ filesets:
- obi_pkg.sv
- reg_pkg.sv
- power_manager_pkg.sv
- hw_fifo_pkg.sv
- core_v_mini_mcu_pkg.sv
file_type: systemVerilogSource

Expand Down
4 changes: 1 addition & 3 deletions hw/core-v-mini-mcu/system_xbar.sv.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -88,16 +88,14 @@ module system_xbar
end
% if xheep.has_il_ram():

localparam ZERO = 32'h0;

for (genvar j = 0; j < XBAR_NMASTER; j++) begin : gen_addr_napot
always_comb begin
port_sel[j] = pre_port_sel[j];
post_master_req_addr[j] = master_req_i[j].addr;
% for i, group in enumerate(xheep.iter_il_groups()):

if (pre_port_sel[j] == RAM_IL${i}_IDX[LOG_XBAR_NSLAVE-1:0]) begin
port_sel[j] = RAM_IL${i}_IDX[LOG_XBAR_NSLAVE-1:0] + {ZERO[LOG_XBAR_NSLAVE-${1+group.n.bit_length()}:0],master_req_i[j].addr[${group.n.bit_length()-1 +1}:2]};
port_sel[j] = RAM_IL${i}_IDX[LOG_XBAR_NSLAVE-1:0] + $unsigned(master_req_i[j].addr[${group.n.bit_length()-1 +1}:2]);
post_master_req_addr[j] = {master_req_i[j].addr[31:${2+group.n.bit_length()-1}], ${2+group.n.bit_length()-1}'h0};
end
% endfor
Expand Down
25 changes: 22 additions & 3 deletions hw/ip/dma/data/dma.hjson
Original file line number Diff line number Diff line change
Expand Up @@ -186,12 +186,14 @@
hwaccess: "hro",
resval: 0,
fields: [
{ bits: "1:0", name: "MODE",
{ bits: "2:0", name: "MODE",
desc: "DMA operation mode",
enum: [
{ value: "0", name: "LINEAR_MODE", desc: "Transfers data linearly"},
{ value: "1", name: "CIRCULAR_MODE", desc: "Transfers data in circular mode"},
{ value: "2", name: "ADDRESS_MODE" , desc: "Transfers data using as destination address the data from ADD_PTR"},
{ value: "2", name: "ADDRESS_MODE", desc: "Transfers data using as destination address the data from ADD_PTR"},
{ value: "3", name: "SUBADDRESS_MODE", desc: "Implements transferring of data when SRC_PTR is fixed and related to a peripheral"},
{ value: "4", name: "HW_FIFO_MODE", desc: "Mode for exploting external stream accelerators"}
]
}
]
Expand Down Expand Up @@ -302,6 +304,23 @@
fields: [
{ bits: "0", name: "FLAG", desc: "Set for window done interrupt" }
]
}
},
{ name: "HW_FIFO_MODE_SIGN_EXT",
desc: '''In HW_FIFO_MODE, is the input data to be sign extended before sending it to the hw read fifo?
(The input of the hw read fifo is on 32 bits, which could be wider than the src data type)''',
swaccess: "rw",
hwaccess: "hro",
resval: 0,

fields: [
{ bits: "0", name: "HW_FIFO_SIGNED",
desc: "Extend the sign to 32 bits",
enum: [
{ value: "0", name: "NO_EXTEND", desc: "Does not extend the sign"},
{ value: "1", name: "EXTEND", desc: "Extends the sign"},
]
}
]
},
]
}
2 changes: 2 additions & 0 deletions hw/ip/dma/dma.core
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ filesets:
- rtl/dma_obiread_fsm.sv
- rtl/dma_obiread_addr_fsm.sv
- rtl/dma_obiwrite_fsm.sv
- rtl/hw_r_fifo_ctrl.sv
- rtl/hw_w_fifo_ctrl.sv
- rtl/dma.sv
file_type: systemVerilogSource

Expand Down
Loading