Implement CSE and peephole optimization #103

vacantron · 2024-01-05T08:24:02Z

Peephole optimization

The MOV after arithmetical or LOAD instruction is redundant. Simply assign the result to the destination register of the MOV instruction.

For example, the generated ARMv7 instructions of the statement t = 1 + 2 + 4 + 8 after optimizing are:

  mov	r0, 1
  mov	r1, 2
  add	r2, r0, r1
  mov	r0, 4
  add	r1, r2, r0
  mov	r0, 8
- add	r2, r1, r0
- mov	r0, r2
+ add	r0, r1, r0

The r0 register in last two instructions is reused and eliminates the redundant MOV instruction.

CSE (common subexpression elimination)

Reuse the read data if it came from the same location of memory. For example, the following code references the same element of array:

int main()
{
    char arr[1];
    int i, t, pos = 0;
    for (i = 0; i < 10000000; i++)
        t = arr[pos] + arr[pos];
    return 0;
}

After the optimization, the common statement arr[pos] was substituted by coping the read result.

  ldr	r0, [sp, 4]
  ldr	r1, [sp, 9]
  add	r2, r0, r1
  ldrb	r3, [r2]
  add	r2, r0, r1
- ldrb	r4, [r2]
- add	r3, r3, r4
+ mov	r2, r3
+ add	r2, r3, r2

It approximately reduces 5% run time on Raspberry Pi 3B.

The CSE is done before the liveness analysis. But the DCE (dead code elimination) is related to the traversal of liveness analysis, the implementation needs more improvement and is postponed.

Related issues:

Implement basic optimizations #88

src/arm-codegen.c

jserv · 2024-01-05T08:32:04Z

You should denote the benefit and measurements.

Simplify the generated machine code according to the specification of ARMv7-A and RV32 ISA. The `MOV` after arithmetical or `LOAD` instruction is redundant. Simplily assign the result to the destination register of the `MOV` instruction. For example, the generated ARMv7 instructions of the statement `t = 1 + 2 + 4 + 8` after optimizing are: ``` mov r0, 1 mov r1, 2 add r2, r0, r1 mov r0, 4 add r1, r2, r0 mov r0, 8 - add r2, r1, r0 - mov r0, r2 + add r0, r1, r0 ``` The `r0` register in last two instructions is reused and eliminates the redundant `MOV` instruction.

Reuse the read data if it came from the same location of memory. For example, the following code references the same element of array: ``` int main() { char arr[1]; int i, t, pos = 0; for (i = 0; i < 10000000; i++) t = arr[pos] + arr[pos]; return 0; } ``` After the optimization, the common statement `arr[pos]` was substituted by coping the read result. ``` ldr r0, [sp, 4] ldr r1, [sp, 9] add r2, r0, r1 ldrb r3, [r2] add r2, r0, r1 - ldrb r4, [r2] - add r3, r3, r4 + mov r2, r3 + add r2, r3, r2 ``` It approximately reduces 5% run time on Raspberry Pi 3B.

jserv reviewed Jan 5, 2024

View reviewed changes

src/arm-codegen.c Outdated Show resolved Hide resolved

jserv changed the title ~~Implement CSE and peephole optimization for optimizing compiler~~ Implement CSE and peephole optimization Jan 5, 2024

jserv reviewed Jan 5, 2024

View reviewed changes

src/arm-codegen.c Outdated Show resolved Hide resolved

vacantron force-pushed the opt branch from bd10552 to 911e7dc Compare January 7, 2024 07:29

vacantron added 3 commits January 7, 2024 15:33

Fix isolated branch in SSA

3b7995e

vacantron force-pushed the opt branch from 911e7dc to 3b7995e Compare January 7, 2024 07:33

jserv merged commit 83609c0 into sysprog21:master Jan 7, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CSE and peephole optimization #103

Implement CSE and peephole optimization #103

vacantron commented Jan 5, 2024 •

edited

Loading

jserv commented Jan 5, 2024

Implement CSE and peephole optimization #103

Implement CSE and peephole optimization #103

Conversation

vacantron commented Jan 5, 2024 • edited Loading

Peephole optimization

CSE (common subexpression elimination)

Related issues:

jserv commented Jan 5, 2024

vacantron commented Jan 5, 2024 •

edited

Loading