Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain operations lack consideration for data types #166

Open
DrXiao opened this issue Nov 12, 2024 · 5 comments
Open

Certain operations lack consideration for data types #166

DrXiao opened this issue Nov 12, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@DrXiao
Copy link
Collaborator

DrXiao commented Nov 12, 2024

[ Edit: 2025/01/21 19:15 ]
Recently, I used the following tricky code to test shecc:

/* test.c */
int main() {
    char a = 0x11, b = 0x22, c = 0x33, d = 0x44;
    char *aa = &a, *bb = &b, *cc = &c, *dd = &d;
    int *aaa = &a;
    char arr[4];
    arr[0] = 0x11;
    arr[1] = 0x22;
    arr[2] = 0x33;
    arr[3] = 0x44;

    /* 1 */
    printf("== 1 ==\n");
    printf("%d %d %d %d\n", a, b, c, d);
    printf("%x %x %x %x\n", aa, bb, cc, dd);
    printf("%d %d %d\n", bb - aa, cc - bb, dd - cc);
    printf("%x %x\n\n", *aaa, aaa[0]);

    /* 2 */
    aaa = arr;

    printf("== 2 ==\n");
    for (int i = 0; i < 4; i++)
	    printf("arr[%d] = %x\n", i, arr[i]);

    printf("aaa = %x, arr = %x\n", aaa, arr);
    printf("*aaa = %x, aaa[0] = %x\n\n", *aaa aaa[0]);
    
    /* 3 */
    printf("== 3 ==\n");
    a += 6000;
    b += 400;
    c -= 400;
    d -= 6000;
    aaa = &a;
    printf("a = %d, b = %d, c = %d, d = %d\n", a, b, c, d);

    return 0;
}

When the code is compiled using GCC with no optimization, all outputs match the expected results:

$ gcc -o test test.c
$ ./test
== 1 ==
17 34 51 68
1759c720 1759c721 1759c722 1759c723
1 1 1
44332211 44332211

== 2 ==
arr[0] = 11
arr[1] = 22
arr[2] = 33
arr[3] = 44
aaa = 1759c754, arr = 1759c754
*aaa = 44332211, aaa[0] = 44332211

== 3 ==
a = -127, b = -78, c = -93, d = -44

However, if using shecc to compile the code, some outputs seem to be wrong:

$ qemu-arm out/shecc-stage2.elf -o test test.c
$ qemu-arm test
== 1 ==
17 34 51 68
407ffd8c 407ffd90 407ffd94 407ffd98
4 4 4
11 11

== 2 ==
arr[0] = 11
arr[1] = 22
arr[2] = 33
arr[3] = 44
aaa = 407ffda8, arr = 407ffda8
*aaa = 44332211, aaa[0] = 44332211

== 3 ==
a = 6017, b = 434, c = -349, d = -5932

By the code and the comment in test.c, it is obvious that the code is divided to 3 snippets. Next, I use these snippets to explain their behaviors and the bug what I found.


snippet 1

After introducing the pull request (#171), every stack allocation increment had been ensured to be properly aligned. Due to stack alignment requirements, each single character variable occupies 4 bytes, despite a character's actual size being 1 byte.

Thus, the following results are correct.

  1. the pointer arithmetic results like bb - aa, cc - bb and dd - cc are 4.
  2. Because aaa points to the address of a, the result of the dereference through aaa is equaivalent to the value of a.

snippet 2

- printf("*aaa = %x, aaa[0] = %x\n\n", *arr, aaa[0]);    /* 2024/12/15 22:56 */
+ printf("*aaa = %x, aaa[0] = %x\n\n", *aaa, aaa[0]);    /* 2025/01/21 19:15 */

(Previously, I made a typo so that I misunderstood pointer dereferencing also had an error. After fixing it, the result is correct.)


snippet 3 - (bug)

Because the variables (a ~ d) are signed characters, their results should be overflowed after executing additions and subtractions.

The correct results should be a = -127, b = -78, c = -93 and d = -44, but the results of the executable compiled by shecc produces error outputs.

Conclusion

According to the result of the snippet 3, the current backends may lack consideration of the data type when generating load/store instructions.

For example, the Arm backend may generates a ldr instruction for loading a character (1 byte), regardless of the data type. But, For character variables, it must generate ldrsb instructions instead.

Since these problems also occur in the RISC-V backend, both the Arm and RISC-V backends require improvements to generate data-type-specific instructions, such as ldrsb for signed characters.

@DrXiao DrXiao added the bug Something isn't working label Nov 12, 2024
@DrXiao
Copy link
Collaborator Author

DrXiao commented Dec 15, 2024

Due to the introduction of some recent pull requests, I fetched the latest changes, refactored the test code, and re-described this issue.

(I will gradually improve the description about this issue in the future.)

@DrXiao DrXiao changed the title Error behaviors for the load/store operations on a character variable Error behaviors of load/store operations and pointer dereferencing Dec 15, 2024
@ChAoSUnItY
Copy link
Collaborator

I'm currently testing snippet 2, but is the last printing statement supposed to be printing *arr? Because I found that your formatting string is actually asserting *aaa instead.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Jan 20, 2025

printf("*aaa = %x, aaa[0] = %x\n\n", *aaa, aaa[0]);

Sorry, it should test *aaa in the snippet 2.

Due to personal reason, I will modify the code and description tomorrow.

@ChAoSUnItY
Copy link
Collaborator

After testing snippet 3, I found that it's quite hard to handle data type conversion more conventionally, due to lack of data size information, but also because of peephole optimization, we'll need to also resize it in arithmetic operations instead of just in OP_load and OP_store (I'm not sure if OP_assign is needed to be carefully handled too since it's meant to move value between registers). Maybe there's a concise way to handle this? Correct me if I'm wrong.

@DrXiao
Copy link
Collaborator Author

DrXiao commented Jan 21, 2025

printf("*aaa = %x, aaa[0] = %x\n\n", *aaa, aaa[0]);

Sorry, it should test *aaa in the snippet 2.

Due to personal reason, I will modify the code and description tomorrow.

After fixing the typo and testing the code again, the result shows that the behavior of pointer dereferencing is correct.

The code and the description have been updated. I think the main title of this issue should be Certain operations lack consideration for data types.

@DrXiao DrXiao changed the title Error behaviors of load/store operations and pointer dereferencing Certain operations lack consideration for data types Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants