Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up malloc/free routines #140

Merged
merged 1 commit into from
Jul 17, 2024
Merged

Speed up malloc/free routines #140

merged 1 commit into from
Jul 17, 2024

Conversation

DrXiao
Copy link
Collaborator

@DrXiao DrXiao commented Jul 15, 2024

This commit improves malloc() and free() in lib/c.c to eliminate a useless member from a structure and reduce execution time.

Refine the definition of chunk_t and improve the two functions using techniques similar to glibc.

For example, in glibc, it can obtain the starting address of a chunk by a macro called mem2chunk(), and the macro moves the given pointer backward directly to get the address of a chunk. Thus, the proposed changes use a similar approach to speed up free().

Close #139

@jserv jserv requested a review from vacantron July 15, 2024 17:01
Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using backticks in git commit messages and comments to ensure compatibility with terminal emulators that may not render the backtick character properly.

Instead, use the pair ' or " (quotation marks).

lib/c.c Outdated Show resolved Hide resolved
lib/c.c Outdated Show resolved Hide resolved
@jserv jserv changed the title Improve malloc() and free() in lib/c.c Speed up malloc/free routines Jul 16, 2024
lib/c.c Outdated
#define CHUNK_SIZE_SZ_MASK 0xFFFFFFFE
#define CHUNK_GET_SIZE(size) (size & CHUNK_SIZE_SZ_MASK)
#define IS_CHUNK_GET_FREED(size) (size & CHUNK_SIZE_FREED_MASK)
#define CHUNK_SET_FREED(size) (size | CHUNK_SIZE_FREED_MASK)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rewriting CHUNK_SET_FREED as following?

#define CHUNK_SET_FREED(size) \
    size |= CHUNK_SIZE_FREED_MASK

Copy link
Collaborator Author

@DrXiao DrXiao Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encounter some trouble when modifying the macro.

Then, I use the following code to test a function-like macro:

/* test.c */
#define MACRO(a) a += 10

int main()
{
    int x = 10;
    printf("%d\n", x);
    MACRO(x);
    printf("%d\n", x);

    return 0;
}
$ out/shecc -o test test.c
Error Unrecognized statement token at source location 16802
#define MACRO(a) a += 10
                   ^ Error occurs here
Abnormal program termination

This implies that shecc cannot parse the macro containing compound operator(s) correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current parser lacks of the ability to deal with compound assignment operators. So, you have to change the statements.

Copy link
Collaborator Author

@DrXiao DrXiao Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the current parser also lacks of the ability to handle assignment operators. Therefore, I think I will use two functions called chunk_set_freed() and chunk_clear_freed() to perform the operations instead of using the original macros.

void chunk_set_freed(chunk_t *chunk) 
{
        chunk->size |= CHUNK_SIZE_FREED_MASK;
}

void chunk_clear_freed(chunk_t *chunk) 
{
        chunk->size &= CHUNK_SIZE_SZ_MASK;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will use two functions called chunk_set_freed() and chunk_clear_freed() to perform the operations instead of using the original macros.

It looks good to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current parser also lacks of the ability to handle assignment operator.

@ChAoSUnItY, The compound assignment operators consist of a binary operator and the simple assignment operator. At present, there is no support for compound assignment operators. Can you evaluate the feasibility to consolidate the parser for compound assignment operators?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the examples that shecc cannot deal with.

#define MACRO1(variable, val) \
               variable = variable + val + 10
#define MACRO2(variable, val) \
               variable += val + 10

These macros contain an assignment operator and a compound assignment operator, and shecc cannot parse them correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the examples that shecc cannot deal with.

Create an issue for tracking.

Copy link
Collaborator

@ChAoSUnItY ChAoSUnItY Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compound assignment operators consist of a binary operator and the simple assignment operator. At present, there is no support for compound assignment operators.

The current situation is that read_body_assignment is failed to realize the macro variable and thus returns false since it's possibly not a local variable nor a global variable, which ultimately results in the parser failed to parse current variable and terminated at the operator.
 

Can you evaluate the feasibility to consolidate the parser for compound assignment operators?

Currently, it seems rather tricky to implement due to the technical debt of macro parsing design (we're not able to determine which syntax parsing we should use to parse macro parameter). Moreover, if we predict it's a lvalue (which in this case it should be) and we use read_lvalue function to parse the parameter, the function requires a non-null var_t, which is impossible because we haven't even parsed macro parameter's actual expression; if we predict it's an expression, we'll lose the target address of the lvalue.

This probably could be resolved after we implement tangle back to shecc since macro expansion would only happens in lexer and parser doesn't have to predict what parsing strategy should be used for the expanded macro parameter.

@jserv
Copy link
Collaborator

jserv commented Jul 16, 2024

Improve the ASCII art as following:

                     |<--  usable memory  -->|
+------+------+------+-------- ... ----------+
| next | prev | size |          :            |
+------+------+------+-------- ... ----------+
^                    ^
|                    |
chunk              (chunk + 1)
                    the starting address returned by malloc().

Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide some timing measurements for the proposed changes.

This commit improves malloc() and free() in lib/c.c to
eliminate an unnecessary member from a structure and reduce
execution time.

First, the member 'ptr' is removed from 'chunk_t'. Originally,
it was used to return the starting address of a usable memory
region when calling malloc(). However, consider the following
code and illustration.

typedef struct chunk {
	struct chunk *next;
	struct chunk *prev;
	int size;
} chunk_t;

chunk_t *chunk = ...

                     |<--- usable memory --->|
+------+------+------+-------- ... ----------+
| next | prev | size |          :            |
+------+------+------+-------- ... ----------+
^                    ^
|                    |
chunk              (chunk + 1)
                    the starting address returned by malloc().

After removing 'ptr' from 'chunk_t', the starting address of
the usable memory region can be obtained by adding 1 to a
pointer to 'chunk_t'. Thus, 'ptr' becomes unnecessary, so
this member is removed from the structure and malloc()
returns the starting address by the mentioned approach.

Second, for the free() function, after removing 'ptr' from
'chunk_t', the least significant bit of the member 'size' is
used to record a flag and prevent traversing the entire
allocation list.

The LSB of 'size' is used to check whether a chunk is freed
or not, and it is also used to speed up the free() function.

If a chunk is in the freelist, the LSB will be set to 1 to
indicate that the chunk is freed. Otherwise, the chunk is
still in the allocation list and usable when the LSB is 0,
and the memory region can be used by the caller who calls
malloc().

Next, when calling free(), the address of a chunk can be
obtained by moving a pointer which is the parameter of
free() backward by sizeof(chunk_t) bytes directly.

                       <--- usable memory --->
+------+------+------+------- ... -------------+
| next | prev | size |         :               |
+------+------+------+------- ... -------------+
^                    ^
|                    |
chunk                ptr
|<------------------>|
 sizeof(chunk_t) bytes

Eventually, it is unnecessary to traverse the allocation list.
The address of a chunk can be obtained by moving the given
pointer backward, and the LSB of 'size' can be used to
validate the chunk, which can then be moved to the freelist.

Close sysprog21#139
@DrXiao
Copy link
Collaborator Author

DrXiao commented Jul 16, 2024

I will continue to improve my commit messages and changes later, but I provide my measurements here first.

Here is the information on my CPU.

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i5-7300HQ CPU @ 2.50GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            9
    CPU max MHz:         3500.0000
    CPU min MHz:         800.0000
...

My measurements use time command to build shecc for Arm and RISC-V outputs and obtain their execution time.

$ make distclean && time make
$ make distclean && time make ARCH=riscv

On the master branch, it took about 26 seconds and 39 seconds to build the executables.

$ make distclean && time make
...
  SHECC out/shecc-stage2.elf

real    0m26.886s
user    0m24.716s
sys     0m2.096s

$ make distclean && time make ARCH=riscv
...
  SHECC out/shecc-stage2.elf

real    0m38.896s
user    0m36.528s
sys     0m2.332s

Then, after introducing the proposed changes, it was reduced to about 6.6 seconds and 18 seconds, respectively.

$ make distclean && time make
...
  SHECC out/shecc-stage2.elf

real    0m6.618s
user    0m4.283s
sys     0m2.178s

$ make distclean && time make ARCH=riscv
...
  SHECC out/shecc-stage2.elf

real    0m17.923s
user    0m15.464s
sys     0m2.254s
master proposed changes Percentage of reduced time
shecc (arm) 26.886 s 6.618 s 75.38%
shecc (riscv) 38.896 s 17.923 s 53.92%

By the proposed changes, the bootstrap time will be significantly reduced.

@jserv jserv merged commit cb34939 into sysprog21:master Jul 17, 2024
4 checks passed
@jserv
Copy link
Collaborator

jserv commented Jul 17, 2024

Thank @DrXiao for contributing!

@DrXiao
Copy link
Collaborator Author

DrXiao commented Jul 17, 2024

By the way, because I use chunk_set_freed() and chunk_clear_freed(), which replace CHUNK_SET_FREED() and CHUNK_CLEAR_FREED(), to set or clear the bit, the execution time may increase a little due to the overhead of function calls.

Thus, here is the latest result of measurements:

master proposed changes Percentage of reduced time
shecc (arm) 26.845 s 6.435 s 76.02%
shecc (riscv) 38.819 s 18.198s 53.12%

However, the bootstrap time is still significantly reduced.

@DrXiao DrXiao deleted the fix-build branch July 17, 2024 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Significantly increased bootstrap time in recent build
3 participants