src: Fix incorrect use of cpu_set_t #122

douglas-raillard-arm · 2021-03-31T10:14:27Z

cpu_set_t allocated dynamically with CPU_ALLOC need to be manipulated
with CPU_*_S() family of macro, otherwise it will be assumed that the
size is equal to sizeof(cpu_set_t), leading to undefined behaviours.
a

vingu-linaro · 2021-03-31T13:30:07Z

As we have:
data->cpusetsize = sizeof(cpu_set_t);
data->cpuset = malloc(data->cpusetsize);
Isn't the assumption correct ?

douglas-raillard-arm · 2021-03-31T14:41:25Z

That's not the only place where a cpuset is created:
https://github.com/scheduler-tools/rt-app/blob/master/src/rt-app.c#L844

Currently, rt-app with musl libc crashes with a given input and when the JSON file name (as passed to CLI) is larger than 92 chars. The crash happens inside CPU_EQUAL_S, where it can be seen that the size that the libc is assuming differs from what is in the cpusetsize member.

In the details, the musl libc defines:

typedef struct cpu_set_t { unsigned long __bits[128/sizeof(long)]; } cpu_set_t;

So a cpu_set_t is 128 bytes long. When we use the dynamic allocation API, we end up with a much smaller object. The fixed-size API ends up using memcmp on with size=128 whereas the value are smaller, so it ends up reading garbage (and unmapped area when the stars are aligned).

douglas-raillard-arm · 2021-03-31T16:45:53Z

Actually it's not the end of the troubles, the patch fixes the crash but now I end up with things like:

[rt-app] <debug> [0] setting cpu affinity to CPU(s) [ 0, 1, 2, 3, 4, 5 ]
pthread_setaffinity_np: Invalid argument

I spotted the following remaining problems:

A wild memcpy is used to copy a from a cpuset to another:

		memcpy(data->def_cpu_data.cpuset, &cpuset,
						data->def_cpu_data.cpusetsize);

Even though the man page specifies:

The cpu_set_t data type is implemented as a bitset. However, the data structure treated as considered opaque: all manipulation of CPU sets should be done via the macros described in this page.
https://linux.die.net/man/3/cpu_alloc_size

On top of that there is no guarantee that sizeof(cpu_set_t) is higher or equal than data->def_cpu_data.cpusetsize, so we could end up in an UB.

There is no straightforward way to get the highest CPU ID set in a cpu_set_t. The closest thing I've found is CPU_COUNT_S but it just gives the number of cpus that are set, so we would have to iterate through the set and keep track of the number of set CPUs that we found, and stop when we found the last one.

Given that the code to retrieve the affinity is:

		ret = pthread_getaffinity_np(pthread_self(),
						    sizeof(cpu_set_t), &cpuset);

there is no way we will ever get a CPU ID higher than what fits in a fixed-sized cpu_set_t, so I'm wondering:

Why is the dynamic API used at all ?
Is it actually worth the trouble ? Especially given that other places in the code use the fixed-size API as you pointed.

douglas-raillard-arm · 2021-04-01T09:19:02Z

Updated with a corrected call to CPU_SET_S (arguments were swapped) and a commit to use the fixed-size API instead of the variably-sized API.

credp

lgtm

credp · 2022-07-13T19:08:46Z

@vingu-linaro I'm considering if it is appropriate for me to merge these changes - they look good to me, we need them because our clib triggers the issue, and we've been using them for a while now. Would you have any objection to me hitting the big green button?

Thanks!

vingu-linaro · 2022-07-18T13:34:09Z

@credp Let me try to remember what was the open question for this PR

vingu-linaro · 2022-07-18T15:40:38Z

patch 1 says: cpu_set_t allocated dynamically with CPU_ALLOC need to be manipulated
with CPU_*_S() family of macro
but patch 2 removes the only call to CPU_ALLOC and uses malloc(sizeof(cpu_set_t)) everywhere instead
also cpusetsize is now always set to sizeof(cpu_set_t)

So should we remove patch1 and only keeps patch2 ?

douglas-raillard-arm · 2022-07-18T16:01:24Z

I think the reason I removed CPU_ALLOC the data it's initialized with come from cpu_set_t anyway:

		cpu_set_t cpuset;
		unsigned int cpu_count;

		ret = pthread_getaffinity_np(pthread_self(),
						    sizeof(cpu_set_t), &cpuset);

...
		cpu_count = CPU_COUNT(&cpuset);
		data->def_cpu_data.cpusetsize = CPU_ALLOC_SIZE(cpu_count);
		data->def_cpu_data.cpuset = CPU_ALLOC(cpu_count);

So there is no point in using the dynamic size API in the current state of the code, since by construction we will never get a CPU_COUNT() that cannot be represented in cpu_set_t.

With both patches, you end up with client code that does not care about what specific type of cpu set is being used. If you drop the first patch, you also drop the ability to easily switch back to a dynamically allocated cpu set (I suppose another API than pthread allows getting the cpu count no matter how big). And that would amount to shooting yourself in the foot:
https://git.musl-libc.org/cgit/musl/tree/include/sched.h#n127

Currently, rt-app will fail to handle more than 128 CPUs in proper static builds, so I wouldn't drop the code that will allow fixing that easily.

vingu-linaro · 2022-07-18T16:19:06Z

I think the reason I removed CPU_ALLOC the data it's initialized with come from cpu_set_t anyway:
		cpu_set_t cpuset;
		unsigned int cpu_count;

		ret = pthread_getaffinity_np(pthread_self(),
						    sizeof(cpu_set_t), &cpuset);

...
		cpu_count = CPU_COUNT(&cpuset);
		data->def_cpu_data.cpusetsize = CPU_ALLOC_SIZE(cpu_count);
		data->def_cpu_data.cpuset = CPU_ALLOC(cpu_count);
So there is no point in using the dynamic size API in the current state of the code, since by construction we will never get a CPU_COUNT() that cannot be represented in cpu_set_t.

With both patches, you end up with client code that does not care about what specific type of cpu set is being used. If you drop the first patch, you also drop the ability to easily switch back to a dynamically allocated cpu set (I suppose another API than pthread allows getting the cpu count no matter how big). And that would amount to shooting yourself in the foot: https://git.musl-libc.org/cgit/musl/tree/include/sched.h#n127

Currently, rt-app will fail to handle more than 128 CPUs in proper static builds, so I wouldn't drop the code that will allow fixing that easily.

But all this is quite misleading:
-Either you fix the system to use dynamically allocated cpu set
-Or you fix it to only use statically allocated.

vingu-linaro · 2022-07-18T16:21:41Z

src/rt-app.c

-	if (!CPU_EQUAL(actual_cpu_data->cpuset, data->curr_cpu_data->cpuset))
-	{
+	if (
+	    actual_cpu_data->cpusetsize != data->curr_cpu_data->cpusetsize ||


Witht this PR, cpusetsize is always set to sizeof(cpu_set_t) so this test is useless

douglas-raillard-arm · 2022-07-18T16:30:24Z

But all this is quite misleading:
-Either you fix the system to use dynamically allocated cpu set
-Or you fix it to only use statically allocated.

I fix the current segfault, make the code agree with itself (I did not introduce cpusetsize it was already there but used only half of the time) while preparing the ground to do the next step. Feel free to drop the PR if fixing the segfault is not enough.

vingu-linaro · 2022-07-18T16:45:40Z

But all this is quite misleading:
-Either you fix the system to use dynamically allocated cpu set
-Or you fix it to only use statically allocated.

I fix the current segfault, make the code agree with itself (I did not introduce cpusetsize it was already there but used only half of the time) while preparing the ground to do the next step. Feel free to drop the PR if fixing the segfault is not enough.

I'm not saying the PR is irrelevant, I'm saying that patch 1 is irrelevant because patch 2 removes the code that patch1 wants to fix so patch 1 is useless. That's why I asked you to keep patch 2 and remove patch 1

douglas-raillard-arm · 2022-07-18T17:09:58Z

Updated the PR with an extra couple of patches to make the max number of detected CPUs configurable. Note that these 2 patches are entirely untested.

douglas-raillard-arm · 2022-07-27T15:25:54Z

Tested the branch in its current state including the last 2 patches, they build and seem to work fine:

Tasks are pinned to the correct CPUs
calibration works well (one value for each CPU, correctly detected)

vingu-linaro · 2022-08-17T09:06:56Z

src/rt-app_parse_config.c

 		for (i = 0; i < json_object_array_length(cpuset_obj); i++) {
 			cpu = json_object_array_get_idx(cpuset_obj, i);
 			cpu_idx = json_object_get_int(cpu);
 			if (cpu_idx > max_cpu) {
 				log_critical(PIN2 "Invalid cpu %u in cpuset %s", cpu_idx, data->cpuset_str);
-				free(data->cpuset);
+				CPU_FREE(data->cpuset);


we have data->cpuset = malloc(data->cpusetsize); few lines above so why replacing free() by CPU_FREE() ?

we should use the same alloc function everywhere: CPU_ALLOC(MAX_CPUS); and CPU_ALLOC_SIZE(MAX_CPUS);

vingu-linaro · 2022-08-17T09:08:49Z

Please use everywhere
cpusetsize = CPU_ALLOC_SIZE(MAX_CPUS);
cpuset = CPU_ALLOC(MAX_CPUS);CPU_ALLOC(MAX_CPUS); and (MAX_CPUS);

and squash the first 3 commits in a single one

cpu_set_t allocated dynamically with CPU_ALLOC need to be manipulated with CPU_*_S() family of macro, otherwise it will be assumed that the size is equal to sizeof(cpu_set_t), leading to undefined behaviours.

Since the use of the dynamique API does not bring a greater range of allowed CPUs in the current state, use the fixed-size API in order to remove reliance on invalid assumptions about the size of the dynamique cpu_set_t value.

Detect up to a configurable number of CPUs, independently from what the libc defaults to.

Free cpuset_str in __shutdown.

douglas-raillard-arm force-pushed the fix_cpu_set branch 2 times, most recently from 36fc52d to 6deeec4 Compare March 31, 2021 12:56

douglas-raillard-arm force-pushed the fix_cpu_set branch from 6deeec4 to 4d0618c Compare March 31, 2021 17:09

credp requested review from credp, jlelli and vingu-linaro July 13, 2022 19:06

credp approved these changes Jul 13, 2022

View reviewed changes

vingu-linaro reviewed Jul 18, 2022

View reviewed changes

vingu-linaro reviewed Aug 17, 2022

View reviewed changes

DouglasRaillard and others added 4 commits December 14, 2023 11:27

src: Fix incorrect use of cpu_set_t

5b74a66

cpu_set_t allocated dynamically with CPU_ALLOC need to be manipulated with CPU_*_S() family of macro, otherwise it will be assumed that the size is equal to sizeof(cpu_set_t), leading to undefined behaviours.

src/rt-app.c: Avoid using CPU_ALLOC()

dea363a

Since the use of the dynamique API does not bring a greater range of allowed CPUs in the current state, use the fixed-size API in order to remove reliance on invalid assumptions about the size of the dynamique cpu_set_t value.

src/rt-app: Detect up to MAX_CPUS

f2f861c

Detect up to a configurable number of CPUs, independently from what the libc defaults to.

rt-app.c: Fix cpuset_str leak

23e8ad7

Free cpuset_str in __shutdown.

douglas-raillard-arm force-pushed the fix_cpu_set branch from 73d3eb0 to 23e8ad7 Compare December 14, 2023 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: Fix incorrect use of cpu_set_t #122

src: Fix incorrect use of cpu_set_t #122

douglas-raillard-arm commented Mar 31, 2021

vingu-linaro commented Mar 31, 2021

douglas-raillard-arm commented Mar 31, 2021 •

edited

Loading

douglas-raillard-arm commented Mar 31, 2021

douglas-raillard-arm commented Apr 1, 2021

credp left a comment

credp commented Jul 13, 2022

vingu-linaro commented Jul 18, 2022

vingu-linaro commented Jul 18, 2022

douglas-raillard-arm commented Jul 18, 2022 •

edited

Loading

vingu-linaro commented Jul 18, 2022

vingu-linaro Jul 18, 2022

douglas-raillard-arm commented Jul 18, 2022

vingu-linaro commented Jul 18, 2022 •

edited

Loading

douglas-raillard-arm commented Jul 18, 2022

douglas-raillard-arm commented Jul 27, 2022

vingu-linaro Aug 17, 2022

vingu-linaro commented Aug 17, 2022

src: Fix incorrect use of cpu_set_t #122

Are you sure you want to change the base?

src: Fix incorrect use of cpu_set_t #122

Conversation

douglas-raillard-arm commented Mar 31, 2021

vingu-linaro commented Mar 31, 2021

douglas-raillard-arm commented Mar 31, 2021 • edited Loading

douglas-raillard-arm commented Mar 31, 2021

douglas-raillard-arm commented Apr 1, 2021

credp left a comment

Choose a reason for hiding this comment

credp commented Jul 13, 2022

vingu-linaro commented Jul 18, 2022

vingu-linaro commented Jul 18, 2022

douglas-raillard-arm commented Jul 18, 2022 • edited Loading

vingu-linaro commented Jul 18, 2022

vingu-linaro Jul 18, 2022

Choose a reason for hiding this comment

douglas-raillard-arm commented Jul 18, 2022

vingu-linaro commented Jul 18, 2022 • edited Loading

douglas-raillard-arm commented Jul 18, 2022

douglas-raillard-arm commented Jul 27, 2022

vingu-linaro Aug 17, 2022

Choose a reason for hiding this comment

vingu-linaro commented Aug 17, 2022

douglas-raillard-arm commented Mar 31, 2021 •

edited

Loading

douglas-raillard-arm commented Jul 18, 2022 •

edited

Loading

vingu-linaro commented Jul 18, 2022 •

edited

Loading