Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: Fix incorrect use of cpu_set_t #122

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

douglas-raillard-arm
Copy link

cpu_set_t allocated dynamically with CPU_ALLOC need to be manipulated
with CPU_*_S() family of macro, otherwise it will be assumed that the
size is equal to sizeof(cpu_set_t), leading to undefined behaviours.
a

@douglas-raillard-arm douglas-raillard-arm force-pushed the fix_cpu_set branch 2 times, most recently from 36fc52d to 6deeec4 Compare March 31, 2021 12:56
@vingu-linaro
Copy link
Member

As we have:
data->cpusetsize = sizeof(cpu_set_t);
data->cpuset = malloc(data->cpusetsize);
Isn't the assumption correct ?

@douglas-raillard-arm
Copy link
Author

douglas-raillard-arm commented Mar 31, 2021

That's not the only place where a cpuset is created:
https://github.com/scheduler-tools/rt-app/blob/master/src/rt-app.c#L844

Currently, rt-app with musl libc crashes with a given input and when the JSON file name (as passed to CLI) is larger than 92 chars. The crash happens inside CPU_EQUAL_S, where it can be seen that the size that the libc is assuming differs from what is in the cpusetsize member.

In the details, the musl libc defines:

typedef struct cpu_set_t { unsigned long __bits[128/sizeof(long)]; } cpu_set_t;

So a cpu_set_t is 128 bytes long. When we use the dynamic allocation API, we end up with a much smaller object. The fixed-size API ends up using memcmp on with size=128 whereas the value are smaller, so it ends up reading garbage (and unmapped area when the stars are aligned).

@douglas-raillard-arm
Copy link
Author

Actually it's not the end of the troubles, the patch fixes the crash but now I end up with things like:

[rt-app] <debug> [0] setting cpu affinity to CPU(s) [ 0, 1, 2, 3, 4, 5 ]
pthread_setaffinity_np: Invalid argument

I spotted the following remaining problems:

  • A wild memcpy is used to copy a from a cpuset to another:
		memcpy(data->def_cpu_data.cpuset, &cpuset,
						data->def_cpu_data.cpusetsize);

Even though the man page specifies:

The cpu_set_t data type is implemented as a bitset. However, the data structure treated as considered opaque: all manipulation of CPU sets should be done via the macros described in this page.
https://linux.die.net/man/3/cpu_alloc_size

On top of that there is no guarantee that sizeof(cpu_set_t) is higher or equal than data->def_cpu_data.cpusetsize, so we could end up in an UB.

  • There is no straightforward way to get the highest CPU ID set in a cpu_set_t. The closest thing I've found is CPU_COUNT_S but it just gives the number of cpus that are set, so we would have to iterate through the set and keep track of the number of set CPUs that we found, and stop when we found the last one.

Given that the code to retrieve the affinity is:

		ret = pthread_getaffinity_np(pthread_self(),
						    sizeof(cpu_set_t), &cpuset);

there is no way we will ever get a CPU ID higher than what fits in a fixed-sized cpu_set_t, so I'm wondering:

  1. Why is the dynamic API used at all ?
  2. Is it actually worth the trouble ? Especially given that other places in the code use the fixed-size API as you pointed.

@douglas-raillard-arm
Copy link
Author

Updated with a corrected call to CPU_SET_S (arguments were swapped) and a commit to use the fixed-size API instead of the variably-sized API.

@credp credp requested review from credp, jlelli and vingu-linaro July 13, 2022 19:06
Copy link
Contributor

@credp credp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@credp
Copy link
Contributor

credp commented Jul 13, 2022

@vingu-linaro I'm considering if it is appropriate for me to merge these changes - they look good to me, we need them because our clib triggers the issue, and we've been using them for a while now. Would you have any objection to me hitting the big green button?

Thanks!

@vingu-linaro
Copy link
Member

@credp Let me try to remember what was the open question for this PR

@vingu-linaro
Copy link
Member

patch 1 says: cpu_set_t allocated dynamically with CPU_ALLOC need to be manipulated
with CPU_*_S() family of macro
but patch 2 removes the only call to CPU_ALLOC and uses malloc(sizeof(cpu_set_t)) everywhere instead
also cpusetsize is now always set to sizeof(cpu_set_t)

So should we remove patch1 and only keeps patch2 ?

@douglas-raillard-arm
Copy link
Author

douglas-raillard-arm commented Jul 18, 2022

I think the reason I removed CPU_ALLOC the data it's initialized with come from cpu_set_t anyway:

		cpu_set_t cpuset;
		unsigned int cpu_count;

		ret = pthread_getaffinity_np(pthread_self(),
						    sizeof(cpu_set_t), &cpuset);

...
		cpu_count = CPU_COUNT(&cpuset);
		data->def_cpu_data.cpusetsize = CPU_ALLOC_SIZE(cpu_count);
		data->def_cpu_data.cpuset = CPU_ALLOC(cpu_count);

So there is no point in using the dynamic size API in the current state of the code, since by construction we will never get a CPU_COUNT() that cannot be represented in cpu_set_t.

With both patches, you end up with client code that does not care about what specific type of cpu set is being used. If you drop the first patch, you also drop the ability to easily switch back to a dynamically allocated cpu set (I suppose another API than pthread allows getting the cpu count no matter how big). And that would amount to shooting yourself in the foot:
https://git.musl-libc.org/cgit/musl/tree/include/sched.h#n127

Currently, rt-app will fail to handle more than 128 CPUs in proper static builds, so I wouldn't drop the code that will allow fixing that easily.

@vingu-linaro
Copy link
Member

I think the reason I removed CPU_ALLOC the data it's initialized with come from cpu_set_t anyway:

		cpu_set_t cpuset;
		unsigned int cpu_count;

		ret = pthread_getaffinity_np(pthread_self(),
						    sizeof(cpu_set_t), &cpuset);

...
		cpu_count = CPU_COUNT(&cpuset);
		data->def_cpu_data.cpusetsize = CPU_ALLOC_SIZE(cpu_count);
		data->def_cpu_data.cpuset = CPU_ALLOC(cpu_count);

So there is no point in using the dynamic size API in the current state of the code, since by construction we will never get a CPU_COUNT() that cannot be represented in cpu_set_t.

With both patches, you end up with client code that does not care about what specific type of cpu set is being used. If you drop the first patch, you also drop the ability to easily switch back to a dynamically allocated cpu set (I suppose another API than pthread allows getting the cpu count no matter how big). And that would amount to shooting yourself in the foot: https://git.musl-libc.org/cgit/musl/tree/include/sched.h#n127

Currently, rt-app will fail to handle more than 128 CPUs in proper static builds, so I wouldn't drop the code that will allow fixing that easily.

But all this is quite misleading:
-Either you fix the system to use dynamically allocated cpu set
-Or you fix it to only use statically allocated.

if (!CPU_EQUAL(actual_cpu_data->cpuset, data->curr_cpu_data->cpuset))
{
if (
actual_cpu_data->cpusetsize != data->curr_cpu_data->cpusetsize ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Witht this PR, cpusetsize is always set to sizeof(cpu_set_t) so this test is useless

@douglas-raillard-arm
Copy link
Author

But all this is quite misleading:
-Either you fix the system to use dynamically allocated cpu set
-Or you fix it to only use statically allocated.

I fix the current segfault, make the code agree with itself (I did not introduce cpusetsize it was already there but used only half of the time) while preparing the ground to do the next step. Feel free to drop the PR if fixing the segfault is not enough.

@vingu-linaro
Copy link
Member

vingu-linaro commented Jul 18, 2022

But all this is quite misleading:
-Either you fix the system to use dynamically allocated cpu set
-Or you fix it to only use statically allocated.

I fix the current segfault, make the code agree with itself (I did not introduce cpusetsize it was already there but used only half of the time) while preparing the ground to do the next step. Feel free to drop the PR if fixing the segfault is not enough.

I'm not saying the PR is irrelevant, I'm saying that patch 1 is irrelevant because patch 2 removes the code that patch1 wants to fix so patch 1 is useless. That's why I asked you to keep patch 2 and remove patch 1

@douglas-raillard-arm
Copy link
Author

Updated the PR with an extra couple of patches to make the max number of detected CPUs configurable. Note that these 2 patches are entirely untested.

@douglas-raillard-arm
Copy link
Author

Tested the branch in its current state including the last 2 patches, they build and seem to work fine:

  • Tasks are pinned to the correct CPUs
  • calibration works well (one value for each CPU, correctly detected)

for (i = 0; i < json_object_array_length(cpuset_obj); i++) {
cpu = json_object_array_get_idx(cpuset_obj, i);
cpu_idx = json_object_get_int(cpu);
if (cpu_idx > max_cpu) {
log_critical(PIN2 "Invalid cpu %u in cpuset %s", cpu_idx, data->cpuset_str);
free(data->cpuset);
CPU_FREE(data->cpuset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have data->cpuset = malloc(data->cpusetsize); few lines above so why replacing free() by CPU_FREE() ?

we should use the same alloc function everywhere: CPU_ALLOC(MAX_CPUS); and CPU_ALLOC_SIZE(MAX_CPUS);

@vingu-linaro
Copy link
Member

Please use everywhere
cpusetsize = CPU_ALLOC_SIZE(MAX_CPUS);
cpuset = CPU_ALLOC(MAX_CPUS);CPU_ALLOC(MAX_CPUS); and (MAX_CPUS);

and squash the first 3 commits in a single one

DouglasRaillard and others added 4 commits December 14, 2023 11:27
cpu_set_t allocated dynamically with CPU_ALLOC need to be manipulated
with CPU_*_S() family of macro, otherwise it will be assumed that the
size is equal to sizeof(cpu_set_t), leading to undefined behaviours.
Since the use of the dynamique API does not bring a greater range of
allowed CPUs in the current state, use the fixed-size API in order to
remove reliance on invalid assumptions about the size of the dynamique
cpu_set_t value.
Detect up to a configurable number of CPUs, independently from what the
libc defaults to.
Free cpuset_str in __shutdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants