Prune the compiled code to its minimum expression #639

JuanSapriza · 2025-02-05T20:56:06Z

Objective

The objective of this PR is to minimize the code as much as possible.
Some new improvements will not affect compilation at all, such as adding --gc-sections, which simply removes unused sections and can free several kB.

Other changes might be more drastical and I will set them as optional and heavily disadviced configurations.

Background

Just leaving here as documentation. I am compiling the following code

int main() return 0;

The initial size was 24 kB. By removing the unused sections it goes down to 12 kB.

The remaining 12 kB are mostly due to:

Interrupt handlers
Standard libraries

Tasks

🟩 Remove unused sections
🟧 ~~Not compile unused interrupt handlers ⚠️~~
🟩 Prune functions in syscalls to make the lighter
🟩 Remove unnecessary calls to standard libraries
🔲 ~~Optionally make data come immediately after text~~
🔲 ~~Reduce default data usage~~
🟩 Allow to have only 1 memory bank
🔲 ~~Fix the use of COMPILER_FLAGS=-Os~~
❓ Move the power manager from 0x3000 (or any rounded address)
🔲 Remove handlers 32-63 from here
🔲 Protect against frequencies not multiple of 100 in the calculation of the NCO

Important ⚠️

This PR builds on top of #636. It is blocked by the merging of that one first (or i will have to cherry pick changes)

JuanSapriza · 2025-02-06T13:49:30Z

@davidmallasen @davideschiavone

With the latest modifications i see that the whole code + data of a small application could very well fit inside 1 memory bank. Yet there is the limitation of needing to have 2 at least.

Why is there such limitation?

davideschiavone · 2025-02-06T16:59:02Z

@davidmallasen @davideschiavone

With the latest modifications i see that the whole code + data of a small application could very well fit inside 1 memory bank. Yet there is the limitation of needing to have 2 at least.

Why is there such limitation?

In principle there is not, it's legacy, if you want we can remove it - still, having 2 banks would be more efficient than one bank (Harvard architecture) so maybe two 16KB memories is better than one single 32kB - however, two 16kB of memories together are bigger than one bank of 32kB - so it's a trade-off - but only one bank would be very inefficient (2 cycles to fetch an instruction every time it is done at the same time of any other load or dma transaction, and 3/4 cycles for a lw/sw from the cve2 CPU)

JuanSapriza · 2025-02-06T20:32:53Z

I agree, but for HEEPidermis, for example, there is a chance that accesses to memory from peripherals are sparse.
I will try to remove this restriction so we can simulate both options (two smaller or one bigger) once we have the full application.

davideschiavone · 2025-02-06T20:41:41Z

I agree, but for HEEPidermis, for example, there is a chance that accesses to memory from peripherals are sparse. I will try to remove this restriction so we can simulate both options (two smaller or one bigger) once we have the full application.

having two small ones should be already possible (AFAIK) - maybe having only 1 is more tricky as the mcu-gen has probably been tailored around having at least 2, which is not just for the HW but for the generation of the linker scripts - nothing too hard to fix, but not just a parameter

davideschiavone · 2025-02-06T21:00:22Z

btw - question @JuanSapriza , why do we have the print functions if we are not using it in your example? shouldn't that be pruned?

JuanSapriza · 2025-02-06T21:27:20Z

Tomorrow i will test having two smaller ones. I think i tried it a while ago but dont remember the result. WIll also see how to have only one.

Regarding the printfs, it's sadly not as simple. The stdlibs are included in some of the drivers, so we might need to do some heavy prunning.

Today i managed to remove several things, but realized that the cost of making the compilation flow so much more complex and sensitive vs having 4kB of printf definitions was probably not worth it now.

JuanSapriza · 2025-02-07T08:06:26Z

With 2x16 kB banks everything fine. A Hello World:

Total space: 32.0 kB = Continuous: [16, 16] kB + Interleaved: [0] kB
Region 	 Start 	End	Sz(kB)	Usd(kB)	Req(kB)	Utilz(%) 
Code:  	  0.0	 16.0	 16.0	9.6	 12.3	76.8
Data:  	 16.0	 32.0	 16.0	5.5	  5.5	34.7

Cont 0 CCCCCCCCC------- 	56.2%
Cont 1 ddddd----------- 	31.2%

JuanSapriza · 2025-02-07T08:08:03Z

@davideschiavone is suggesting to try this out. https://github.com/Velko/FsLibc/tree/master
I think it will help a lot bringing both code and data space down

JuanSapriza · 2025-02-07T08:16:51Z

having two small ones should be already possible (AFAIK) - maybe having only 1 is more tricky as the mcu-gen has probably been tailored around having at least 2, which is not just for the HW but for the generation of the linker scripts - nothing too hard to fix, but not just a parameter

Just tried with 1 mem bank. Only needed to change a check in system.py where it was verifying that mem banks >=2 .... beyond that everything else worked out of the box jaahahah

JuanSapriza · 2025-02-07T10:59:36Z

I went down to the syscalls, handlers and asserts and removed the use of <stdio.h>. This effectively frees 4 kB. Now i am at 8 kB of code + 5.3 kB of data that needs prunning.

Things that i need to check:

🔲 Changing this exception printings might break everything - we need tests over exceptions!
🔲 Not including unused files for compilation. The interrupt handlers are being overriden by their non-weak implementations, even if your application doesnt use that peripheral. This comes from the problem of including all files. e.g. if you comment the contents of dma.h you will get errors in the compilation from dma_sdk.h (??!?!?)
🔲 We have several circular dependencies on the drivers

JuanSapriza · 2025-02-07T14:44:25Z

Removed 1.7kB of program memory by adding a macro that resolved the UART NCo in compile time instead of before every printf.

This does not stop you from changing the baudrate in runtime, but you need to recompute the NCO.

* Don't disable global interrupts with DMA interrupt * Add vsim option * Add DMA functionality for SPI subaddressing * Add setup functions to bsp * Fix DMA subaddressing mode. * Add tests for DMA subaddressing mode. * Fix verible * Debug in CI * Delete commented parts in dma.sv and uncomment test sections in example_dma_subaddressing. Restore original version of example_power_manager. * Add hardware fifo mode to DMA This DMA mode allows data to be forwarded to an external accelerator with I/O fifos tightly coupled with the DMA * Fix verible * Revert "Debug in CI" This reverts commit 560df9e. * Fix CI * Test CI * Modify tb.vlt Add testharness signals related to hw fifo mode to tb.vlt, as it may happen that they are actually undrive if no accelerator is coupled with the DMA * Fix example_power_manager dma.c change in DMA_CSR_REG_MIE_MASK in this PR causes the external interrupts to be disabled before the GPIO-related code portion in the example_power_manager. The previous version relied on the dma HAL to enable the external interrupts. The example has been modified to enable the external interrupts in the main. * Revert "Fix CI" This reverts commit ecdd737. * Revert "Test CI" This reverts commit a12dbea. * Blacklist dma subaddress mode in CI * Add documentation and image for new DMA modes * Fix hw_fifo package imports and add copyright * Fix hw_fifo_ctrl files * Fix verible --------- Co-authored-by: alessionaclerio22 <[email protected]>

* Added script to visualize memory usage after compilation * Improved clarity * Corrected the script that was not taking the values correctly in the vcase of no interleaved bus * removed the need for numpy * Corrected parsing of max memory size of code * Improved way of computing start and end of text and data * Added comments and simplified some functions * Removed the parsing of the map file. Now parsing the output of readelf * Fixed issues with representation * Catch problems with flash_exec * added readelf to the CI * added apt insall binutils * Simply quit the script if readelf not available * removed all version * corrected flash load * Fixed display of interleaved space

JuanSapriza · 2025-02-21T09:59:38Z

From the remaining 6.8 kB I see that the distribution is more or less as follows:

3 kB of intrrupt handlers
3 kB of libraries (basic syscalls)

I trust we can reduce the size of those interrupts based on the following:

Size of each fast interrupt handler

Each interrupt handler is dominated by the push/pop of all the registers.

Removing them by not declaring the function as an interrupt will reduce code size by 1kB.
But... breaks everything, naturally hahaha
Making the functions volatile will reduce in almost 1kB by only pushing the SP - but will end up crashing anyways.
I believe something can be done in the middle

Other option I can think of is trying not to define these functions if their peripheral is not used. This requires a better handling of source/header inclusions, which currently is a ducking mess 🦆

JuanSapriza · 2025-02-21T12:29:32Z

In order to print an error message in the case where an interrupt that does not have an associated handler we were using the puts function. This function in turn used XXX and YYY function which resulted in the inclusion of 3 kB of code.

Sacrificing those messages the code can be reduced by 3kB when you do not use printf.
Removing puts required forcing the inclusion of some of the functions we had in syscalls.c

JuanSapriza · 2025-02-21T12:56:55Z

Added a function _writestr in syscalls that allows us to simply write a string. Is a low-cost substitue to printf. Adds an overhead of ~100 Bytes, which feels acceptable vs 4 kB of using printf. Also, by using this in several places (like assert and handler), i reduced another 400 Bytes... so pays off hehe)

8 kB of code

int main(){
    printf("hi world");
    return 0;
}

3.7 kB of code

int main(){
    _writestr("hi world");
    return 0;
}

JuanSapriza · 2025-02-21T20:51:45Z

⚠️ Dangerous

I did something a little dangerous. Explanation below:

The handler functions of the fast interrupt controller (FIC) are defined added in the vectors.S and defined/declared in drivers/fast_intr_cntr/fast_intr_ctrl.c. There you have several handler_irq_fast_xxx for each fast interrupt connected to the core. Inside that function two things were done: 1) clear the interrupt flag, 2) call another handler: fic_irq_xxx.

void handler_irq_fast_dma(void)
{
    // The interrupt is cleared.
    clear_fast_interrupt(kDma_fic_e);
    // call the weak fic handler
    fic_irq_dma();
}

These handler_irq_fast_xxx was declared as __attribute__((interrupt)) - which made it back up all the registers upon entry and pop them back upon returning. This is safe and a good practice -- but... this means that we will ALWAYS have 14×2×17 instructions for pushing and popping even if the peripheral is not even used and the handler will be an empty handler! So we are always using ~2kB just for useless push/pop.

An opportunity to remove these arises from the second function call, the one to fic_irq_dma.
This is an empty weak implementation to be replaced by the application or driver: i.e. if you dont use a peripheral, this function remains empty, using very little space in memory.

__attribute__((weak, optimize("O0"))) void fic_irq_dma(void)
{
    /* Users should implement their non-weak version */
}

In the drivers you can find strong implementations like this one:

void fic_irq_dma(void)
{
    for (int i = 0; i < DMA_CH_NUM; i++)
    {
        if (dma_subsys_per[i].peri->TRANSACTION_IFR == 1)
        {
            dma_subsys_per[i].intrFlag = 1;
            dma_intr_handler_trans_done(i);
 . . .

If we managed to transfer the register back-up safety to the strong implementation, then all those push/pop would only appear if you actually use the interrupt, and not by default.

My way of achieving this was pushing and popping only the registers necessary to clear the interrupt flag by making the handler "volatile", and then make the strong implementation "interrupt" to ensure that during the actual interrupt handling routine nothing is overridden.

I used these two macros to push/pop after entering the handler and before exiting. Also made the handler __attribute((naked)).

#define SAVE_REGISTERS()         \
    __asm volatile(              \
        "addi sp, sp, -16\n\t"   \
        "sw   a0, 12(sp)\n\t"   \
        "sw   ra, 8(sp)\n\t"     \
        "sw   a4, 4(sp)\n\t"     \
        "sw   a5, 0(sp)\n\t"     \
    )

#define RESTORE_REGISTERS()      \
    __asm volatile(              \
        "lw   a5, 0(sp)\n\t"     \
        "lw   a4, 4(sp)\n\t"     \
        "lw   ra, 8(sp)\n\t"     \
        "lw   a0, 12(sp)\n\t"   \
        "addi sp, sp, 16\n\t"    \
        "mret\n\t"              \
    )

well... this does not work.

My inital guess was that the stack is only 16 words deep, but I dont know where i could confirm or deny this. @davideschiavone @davidmallasen ideas? If that were the case, after doing some pushes i shouldnt be able to push the whole 16 registers again. I would get bullshit when popping and thus why i get llegal instruction (hart 0) at PC 0x00010460: 0xee909810 (PC much higher than the end of my code).

However, if i pop before calling the second handler (in which case the stack should be empty again), the program gets stuck forever.

ToDo find out where

So.. now... i removed the __attribute__((interrupt)) from the strong implementations and everything works fine. My guess: pure luck 🍀 This is a dangerous situation, right? When entering the strong implementation of the handler they only push/pop a few registers.

The advantage of this is that the code goes down from 3.8 kB -> 3.1 kB (not down 2 kB as some push/pop persist.

davideschiavone · 2025-02-21T20:53:04Z

sw/device/lib/drivers/fast_intr_ctrl/fast_intr_ctrl.c

+#define INTERRUPT_HANDLER_ABI __attribute__((aligned(4), naked))
+
+
+/**


I don't think this is dangerous, i think this is wrong. When you jump to an async event (and return from it) you need to push to the stack all the registers you are gonna modify within the ISR and then restore them. How do you know that you are gonna use only those 5 registers ?

Especially if they are week, so could be redefined by users?

In the handler the registers used are well known, the problem is indeed in the weak implementation.

If I managed to make all the registers be backed-up upon entering the weak implementation, that would be perfect. However, I could not manage to make it work. Can you think of anything that would prevent that?

davideschiavone · 2025-02-21T20:58:15Z

Added a function _writestr in syscalls that allows us to simply write a string. Is a low-cost substitue to printf. Adds an overhead of ~100 Bytes, which feels acceptable vs 4 kB of using printf. Also, by using this in several places (like assert and handler), i reduced another 400 Bytes... so pays off hehe)

8 kB of code
int main(){
    printf("hi world");
    return 0;
}
3.7 kB of code
int main(){
    _writestr("hi world");
    return 0;
}

I would not add the _writesrtr function among the system calls. It's not standard. Regarding the interrupt puts, that's legacy from the core testbench, i would simply remove it or call exit or whatever dealing with the testbench. We don't need to print anything here

JuanSapriza · 2025-02-21T21:00:21Z

I would not add the _writesrtr function among the system calls. It's not standard. Regarding the interrupt puts, that's legacy from the core testbench, i would simply remove it or call exit or whatever dealing with the testbench. We don't need to print anything here

Where would you add it? Needs to be somewhere low as handlers and assert use it.

puts was indeed removed in this branch.

davideschiavone · 2025-02-21T21:29:38Z

I would not add the _writesrtr function among the system calls. It's not standard. Regarding the interrupt puts, that's legacy from the core testbench, i would simply remove it or call exit or whatever dealing with the testbench. We don't need to print anything here

Where would you add it? Needs to be somewhere low as handlers and assert use it.

puts was indeed removed in this branch.

Handlers should not print in general, but indeed assert could. For now leave it there

…educe 24->16kB code size

…mentations

…bring some troubles

… computed every time before sending a printf.

…ze drastically

…ced during the inclusions needed for puts

…ead of printf

…functions to only use _writestr

JuanSapriza · 2025-02-24T19:58:00Z

I tested this on FPGA and also works with hello world and example_dma.

davideschiavone · 2025-02-25T10:17:23Z

can you @JuanSapriza please remove the ISRs 32 to 63? https://github.com/esl-epfl/x-heep/blob/main/sw/device/lib/crt/vectors.S#L84 basically deleting from line 84 to 117 included

davideschiavone · 2025-02-25T10:18:37Z

sw/device/lib/crt/vectors.S

+	/*
+	REMOVING THIS CALL REDUCES CODE SIZE CONSIDERABLY
+	This will not print an error message in case of 
+	jumping to an unknown error handler.
 	la a0, unknown_msg
 	jal ra, puts


why don't we remove these puts? either replacing them with _write or simply handling them with the testbench? can be done here on in another Pr in the future, in this case pls open an issue

I would leave the comment to leave a record that they can be added if someone ever needs them, if you uncomment it it should still work. I can add this additional info if you think thats ok

no I just mean removing the puts, not the ISRs - maybe replacing them with _write?

davideschiavone · 2025-02-25T10:19:29Z

sw/device/lib/crt/vectors.S

+/*
+THESE STRINGS ARE NOT LONGER NEEDED
+Only the last two were used, and their 
+call was removed to reduce base code size
 .section .rodata


if they are not needed why don't we delete them?

davideschiavone · 2025-02-25T10:21:12Z

sw/device/lib/runtime/handler.c

 }

 __attribute__((weak)) void handler_ecall(void) {
-  printf("Environment call encountered\n");
+  //printf("Environment call encountered\n");


why commenting it?

nah, this should be changed to a _writestr as in the others

davideschiavone · 2025-02-25T10:21:43Z

sw/device/lib/drivers/uart/uart.c


  // Requested baudrate is too high for the given clock frequency.
-  if (nco != nco_masked) {
+  if (uart->nco != nco_masked) {


where is nco set and where the division is performed?

should i add a comment on this?

davideschiavone · 2025-02-25T10:22:17Z

sw/device/lib/runtime/syscalls.c

@@ -49,20 +50,19 @@ pid_t   _getpid (void);
 int     _isatty (int __fildes);
 int     _link (const char *__path1, const char *__path2);
 _off_t  _lseek (int __fildes, _off_t __offset, int __whence);
-ssize_t _read (int __fd, void *__buf, size_t __nbyte);
+int     _read (int __fd, void *__buf, int __nbyte);


did you check this also with the OpenHW Group compiler used in HEEPatia? and with LLVM?

with clang they compile (in the CI). I can check with the OHW compiler on HEEPatia. But i would say if this does not work with that compiler we open an issue later and i fix it quickly.

unfortunately I do not agree as as soon as X-HEEP gets revendorized in HEEPatia, everything goes broken - I am sure these changes may break it indeed

I am checking now, will let you know in a while

it works with the OHW compiler.
However, few things to fix:

The app size scripts goes crazy when code is on the IL banks -- will do a PR later

OHW compiler requires you to explicitly include "syscalls.h" -- will try running all the apps later

I compiled all the apps from HEEPatia with make test TEST_FLAGS=--compile-only and 33 apps do not compile, most of them did already not compile on HEEPatia, and the others do not seem to be related to these changes.

This is crazy, i thought @LuigiGiuffrida98 and @davidmallasen fixed them...
Let's discuss this tomorrow

davideschiavone · 2025-02-25T10:24:10Z

sw/device/target/nexys-a7-100t/x-heep.h

+// We assume that the lowest baudrate will be 9600, and the largest 256000. 
+// Given this range, by dividing by 100 it always remains integer and below 32-bits.
+// This saves us the need of performing 64-bit divisions to compute NCO. 
+#define UART_NCO ((uint32_t)(((uint32_t)((uint32_t)(UART_BAUDRATE/100))<<20)/((uint32_t)(REFERENCE_CLOCK_Hz/100))))


what if we are using the FLL and we change the reference clock? did you try this in HEEPatia for example? I suggest you re-vendor this fork of X-HEEP in HEEPatia to try things out (tagging @davidmallasen for information)

this is an interesting question. In theory, as long as the frequency is multiple of 100 it should work just fine. this could be not true if e.g. we use 32'768 Hz.

What i can do is add a macro that chooses the way of computing this: precise/light. Then, would this be chosen at which level? Not sure

yes, a macro could be fine, like "TARGET_USE_FLL" or "USING_FIXED_CLK" etc. Keep in mind that this will increment the maintenance effort.... how big is the div64 bit function?

almost 2kB!!! it would be 30% of the code of the minimum application

So I added an #ifdef in syscalls to prevent using a NCO that might not be defined.
That way if NCO is defined (you are using one of the FPGA targets, then we use the hardcoded NCO (because also the system clock will be hardcoded --i understand--). If that is not the case (e.g. you are targetting asic) then you compute the NCO with the 64-bit division.

uart.base_addr = mmio_region_from_addr((uintptr_t)UART_START_ADDRESS); uart.baudrate = UART_BAUDRATE; uart.clk_freq_hz = soc_ctrl_get_frequency(&soc_ctrl); #ifdef UART_NCO uart.nco = UART_NCO; #else uart.nco = ((uint64_t)uart.baudrate << (NCO_WIDTH + 4)) / uart.clk_freq_hz; #endif

Thanks @davidmallasen !!

I tested this by removing the define for TARGET=pynq-z2 and compiles 👍

also works fine on FPGA

the only thing I do not like is that for chips (or any other thing) that will have a fixed clock frequency, we need to remember to define UART_NCO, which is little descriptive about it... we should have something more generic like "USE_FIXED_CLK_FREQ" or any other thing... we can discuss about it -

as I am on this topic, back at HEEPocrates time (with my old crappy Make flow for SW), the target folder was also exposed at HEEPocrates level (i.e. different than the X-HEEP one) while I think now it is lost, which is wrong... we should have a target for each CHIP (see indeed the fixed FLL thing) - for HEEPnosis the only thing I had to change was the USE_TARGET_FLL which we solved it by having the COMPILIER_FLAGS passed command line (thx @JuanSapriza ) - but in principle, it does not scale - we should have a TARGET for the chip itself, and another one (maybe called BOARD or whatever else) for the BOARD, being the testbench in case of X-HEEP_SIM, being something else in the X-HEEP_FPGA_PYNQ, etc... now they are a bit mixed, but if you see HEEPocrates it has many targets depending on the configuration of the testing board, the simulation, etc... we should re-think this - this thing I am writing here does not belong to this PR - it's just a not that I will now transfer to a Github issue - just posting it here to open the discussion on this PR tomorrow (pls @JuanSapriza add this PR in the heepidermis slides)

I agree. Yesterday with @davidmallasen we reached a similar conclusion (or at least that crossed my mind).

JuanSapriza · 2025-02-25T10:28:19Z

can you @JuanSapriza please remove the ISRs 32 to 63? https://github.com/esl-epfl/x-heep/blob/main/sw/device/lib/crt/vectors.S#L84 basically deleting from line 84 to 117 included

Not connected on ibex stands for we will never jump here unless its an error? then yes, i'll gladly do it

davideschiavone · 2025-02-25T10:49:26Z

can you @JuanSapriza please remove the ISRs 32 to 63? https://github.com/esl-epfl/x-heep/blob/main/sw/device/lib/crt/vectors.S#L84 basically deleting from line 84 to 117 included

Not connected on ibex stands for we will never jump here unless its an error? then yes, i'll gladly do it

the CPU has at most 32 ISRs (or let's say, the ID of the last one is 31, but physically they are 19) so it is physically impossible to jump to ISR 32 to 63. Don't know why the code is there, must be a copy and paste from another testbench

…not hardcoded

davideschiavone · 2025-02-25T16:17:04Z

sw/device/lib/crt/vectors.S

@@ -83,38 +83,6 @@ vector_table:
 	j __no_irq_handler
 	// 64-32 : not connected on Ibex
 	j verification_irq_handler


is this the ISR number 32? i.e. the 33th? if so we need to delete this one as well :)

but do you know why would it be different? (it did not jump to _no_irq_handler

fix interleaved bus (esl-epfl#620)

66c7153

davidmallasen added the software Software and application label Feb 6, 2025

LuigiGiuffrida98 and others added 2 commits February 19, 2025 14:25

davideschiavone reviewed Feb 21, 2025

View reviewed changes

davidmallasen changed the base branch from main to develop February 24, 2025 08:44

JuanSapriza force-pushed the pr_min_sw branch from 7c1c330 to 94f2e17 Compare February 24, 2025 16:32

JuanSapriza added 5 commits February 24, 2025 18:56

Added minimal sw app. Also added --gc-sections to compiler flags to r…

9b7a170

…educe 24->16kB code size

TEMPORARY SOLUTION: comment interrupt routines calling the weak imple…

8fd461b

…mentations

Removed the parsing of the map file. Now parsing the output of readelf

0ca09c0

Catch problems with flash_exec

d048f61

added readelf to the CI

ca61762

JuanSapriza added 13 commits February 24, 2025 18:56

exposed _write to be used in handler. Also changed types. This might …

d71b814

…bring some troubles

removed inclsuion of string.h

9a94377

Removed the need for 64-bit division in the UART + now the NCO is not…

a3fb02c

… computed every time before sending a printf.

restored -lgcc as it is not needed to remove it

ef2e3b3

Fixed differences in sw

98ac4b1

Restored the plic handlers that were crashing some apps

53fe69c

Added notes on how to minimize code size

ad4fb70

Removed use of puts function in vector.S which was icnreasing code si…

0abcfa9

…ze drastically

Forced the inclusion of functions from syscalls that were being repla…

d2099da

…ced during the inclusions needed for puts

Sacrificed 0.1 kB to have a writestr function in syscalls to use inst…

e71924c

…ead of printf

Fixed the _writestr definition

ee522f6

Replaced assert and handler functions that were using other syscalls …

a34e7ba

…functions to only use _writestr

added inclusions and declarations for the sake of cpp

0e13225

JuanSapriza force-pushed the pr_min_sw branch from 94f2e17 to 0e13225 Compare February 24, 2025 17:57

Corrected rebase

abedd5c

JuanSapriza marked this pull request as ready for review February 24, 2025 19:58

JuanSapriza requested a review from davideschiavone February 24, 2025 19:58

davideschiavone requested changes Feb 25, 2025

View reviewed changes

JuanSapriza marked this pull request as draft February 25, 2025 10:28

JuanSapriza added 5 commits February 25, 2025 13:44

included syscalls.h in main.c for compatibility with OHW compiler

ac5626d

Removed unnecessary jumps in vector.S

e753eed

Added message on ecall. Adds ~100 B

53b3f91

one less file to diff

82b15ce

Added ifdef to prevent hardcoding NCO and use the 64-bit division if …

e3da38c

…not hardcoded

davideschiavone reviewed Feb 25, 2025

View reviewed changes

davideschiavone changed the base branch from develop to dev_branch February 27, 2025 17:28

		#define INTERRUPT_HANDLER_ABI __attribute__((aligned(4), naked))


		/**

Prune the compiled code to its minimum expression #639

Are you sure you want to change the base?

Prune the compiled code to its minimum expression #639

Conversation

JuanSapriza commented Feb 5, 2025 • edited Loading

Objective

Background

Tasks

Important ⚠️

JuanSapriza commented Feb 6, 2025

davideschiavone commented Feb 6, 2025 • edited Loading

JuanSapriza commented Feb 6, 2025

davideschiavone commented Feb 6, 2025

davideschiavone commented Feb 6, 2025

JuanSapriza commented Feb 6, 2025

JuanSapriza commented Feb 7, 2025 • edited Loading

JuanSapriza commented Feb 7, 2025

JuanSapriza commented Feb 7, 2025

JuanSapriza commented Feb 7, 2025

JuanSapriza commented Feb 7, 2025 • edited Loading

JuanSapriza commented Feb 21, 2025

Size of each fast interrupt handler

JuanSapriza commented Feb 21, 2025

JuanSapriza commented Feb 21, 2025

JuanSapriza commented Feb 21, 2025 • edited Loading

⚠️ Dangerous

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davideschiavone commented Feb 21, 2025

JuanSapriza commented Feb 21, 2025

davideschiavone commented Feb 21, 2025 • edited Loading

JuanSapriza commented Feb 24, 2025

davideschiavone commented Feb 25, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davideschiavone Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JuanSapriza Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JuanSapriza commented Feb 25, 2025

davideschiavone commented Feb 25, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JuanSapriza commented Feb 5, 2025 •

edited

Loading

davideschiavone commented Feb 6, 2025 •

edited

Loading

JuanSapriza commented Feb 7, 2025 •

edited

Loading

JuanSapriza commented Feb 7, 2025 •

edited

Loading

JuanSapriza commented Feb 21, 2025 •

edited

Loading

davideschiavone commented Feb 21, 2025 •

edited

Loading

davideschiavone Feb 25, 2025 •

edited

Loading

JuanSapriza Feb 25, 2025 •

edited

Loading