Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CSE and peephole optimization #103

Merged
merged 3 commits into from
Jan 7, 2024
Merged

Conversation

vacantron
Copy link
Collaborator

@vacantron vacantron commented Jan 5, 2024

Peephole optimization

The MOV after arithmetical or LOAD instruction is redundant. Simply assign the result to the destination register of the MOV instruction.

For example, the generated ARMv7 instructions of the statement t = 1 + 2 + 4 + 8 after optimizing are:

  mov	r0, 1
  mov	r1, 2
  add	r2, r0, r1
  mov	r0, 4
  add	r1, r2, r0
  mov	r0, 8
- add	r2, r1, r0
- mov	r0, r2
+ add	r0, r1, r0

The r0 register in last two instructions is reused and eliminates the redundant MOV instruction.

CSE (common subexpression elimination)

Reuse the read data if it came from the same location of memory. For example, the following code references the same element of array:

int main()
{
    char arr[1];
    int i, t, pos = 0;
    for (i = 0; i < 10000000; i++)
        t = arr[pos] + arr[pos];
    return 0;
}

After the optimization, the common statement arr[pos] was substituted by coping the read result.

  ldr	r0, [sp, 4]
  ldr	r1, [sp, 9]
  add	r2, r0, r1
  ldrb	r3, [r2]
  add	r2, r0, r1
- ldrb	r4, [r2]
- add	r3, r3, r4
+ mov	r2, r3
+ add	r2, r3, r2

It approximately reduces 5% run time on Raspberry Pi 3B.

The CSE is done before the liveness analysis. But the DCE (dead code elimination) is related to the traversal of liveness analysis, the implementation needs more improvement and is postponed.

Related issues:

src/arm-codegen.c Outdated Show resolved Hide resolved
@jserv jserv changed the title Implement CSE and peephole optimization for optimizing compiler Implement CSE and peephole optimization Jan 5, 2024
src/arm-codegen.c Outdated Show resolved Hide resolved
@jserv
Copy link
Collaborator

jserv commented Jan 5, 2024

You should denote the benefit and measurements.

Simplify the generated machine code according to the specification of
ARMv7-A and RV32 ISA.

The `MOV` after arithmetical or `LOAD` instruction is redundant.
Simplily assign the result to the destination register of the `MOV`
instruction.

For example, the generated ARMv7 instructions of the statement
`t = 1 + 2 + 4 + 8` after optimizing are:

```
  mov	r0, 1
  mov	r1, 2
  add	r2, r0, r1
  mov	r0, 4
  add	r1, r2, r0
  mov	r0, 8
- add	r2, r1, r0
- mov	r0, r2
+ add	r0, r1, r0
```

The `r0` register in last two instructions is reused and eliminates the
redundant `MOV` instruction.
Reuse the read data if it came from the same location of memory. For
example, the following code references the same element of array:

```
int main()
{
    char arr[1];
    int i, t, pos = 0;
    for (i = 0; i < 10000000; i++)
        t = arr[pos] + arr[pos];
    return 0;
}
```

After the optimization, the common statement `arr[pos]` was substituted
by coping the read result.

```
  ldr	r0, [sp, 4]
  ldr	r1, [sp, 9]
  add	r2, r0, r1
  ldrb	r3, [r2]
  add	r2, r0, r1
- ldrb	r4, [r2]
- add	r3, r3, r4
+ mov	r2, r3
+ add	r2, r3, r2
```

It approximately reduces 5% run time on Raspberry Pi 3B.
@jserv jserv merged commit 83609c0 into sysprog21:master Jan 7, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants