Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched/semaphore: Move POSIX regulated parts of semaphores into libc #11257

Merged
merged 3 commits into from
Nov 27, 2023

Conversation

pussuw
Copy link
Contributor

@pussuw pussuw commented Nov 23, 2023

Summary

This moves the POSIX regulated parts of semaphores into libc, i.e. cancellation points and errno handling, meaning some of the work can now be done directly in user space without kernel involvement.

Impact

Architecturally quite significant, otherwise none (assuming no regressions)

Testing

rv-virt:nsh64 (ostest) / knsh64 / ksmp64 (what can be tested)

@pussuw pussuw changed the title Sem to libc sched/semaphore: Move POSIX regulated parts of semaphores into libc Nov 23, 2023
@pussuw
Copy link
Contributor Author

pussuw commented Nov 23, 2023

Continuation of #11165

@acassis
Copy link
Contributor

acassis commented Nov 23, 2023

@pussuw besides ostest, it is important to test using the LTP suite to confirm everything is working!

@pussuw
Copy link
Contributor Author

pussuw commented Nov 23, 2023

@acassis do we have the LTP testsuite readily integrated, or how do people typically run it ? I have ran some of the test cases manually but that is quite a bit of manual work.

I noticed that nuttx-apps has an application called ltp, and this can be run with the sim:posix_test target. A few questions though:

  • Is there a script that runs all of the tests ?
  • Are those tests run as part of CI ?

@pussuw
Copy link
Contributor Author

pussuw commented Nov 23, 2023

The _SEM_XX macros are used from libnx/ , can someone advise which version should be used there ? I assume we want to use the nxsem_ versions so the graphics APIs don't turn into cancellation points ?

@xiaoxiang781216
Copy link
Contributor

The _SEM_XX macros are used from libnx/ , can someone advise which version should be used there ? I assume we want to use the nxsem_ versions so the graphics APIs don't turn into cancellation points ?

_SEM_XX need to be kept since many libxxx can be called from both kernel and userspace.

@pussuw
Copy link
Contributor Author

pussuw commented Nov 23, 2023

The _SEM_XX macros are used from libnx/ , can someone advise which version should be used there ? I assume we want to use the nxsem_ versions so the graphics APIs don't turn into cancellation points ?

_SEM_XX need to be kept since many libxxx can be called from both kernel and userspace.

Yes but this is exactly my question, is any API from libxxx allowed to become a cancellation point via e.g. sem_wait ? What is wrong with just calling the nxsem_ version (i.e. the system call) always from libxxx ?

Now the nxsem_ APIs are all available to libc

@xiaoxiang781216
Copy link
Contributor

since the kernel caller don't want to enable cancellation or change error, but the user space caller want. @patacongo could you give more input?

@patacongo
Copy link
Contributor

patacongo commented Nov 23, 2023

Yes but this is exactly my question, is any API from libxxx allowed to become a cancellation point via e.g. sem_wait ? What is wrong with just calling the nxsem_ version (i.e. the system call) always from libxxx ?

[This comments are not really relevant, but I'll keep them here for the POSIX references].

POSIX specifies exactly which APIS must be cancellation, which APIs may optionally be cancellation points are required, and all specifies that any other cancellation point function other than those is forbidden. See 2.9.5 Thread Cancellation in https://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html

"An implementation shall not introduce cancellation points into any other functions specified in this volume of IEEE Std 1003.1-2001."

If cancellation points can be introduced in any other way, that would be a serious POSIX violation.

POSIX lists those functions which must support cancellation points. It is not an option available to the application; those functions must always support cancellation points.

Also lets not introduce any non-standard (i.e., non-POSIX or non-Linux) APIs into the application interface. NuttX is a standards based OS and must not support many non-standard interfaces.

@patacongo
Copy link
Contributor

patacongo commented Nov 23, 2023

_SEM_XX need to be kept since many libxxx can be called from both kernel and userspace.

Sorry, I was not in context and didn't respond well when @xiaoxiang781216 asked me to comment. I leave my mostly irrelevant response in place any way because it does have the important POSIX requirements.

There is an unresolved problem withe_SEM_XX in the FLAT build most. Since there is only one copy of libc in that configuration and it will use sem_wait() with the cancellation point in all contexts. That is wrong.

However, it works perfectly in the PROTECTED and KERNEL builds where there are two versions of libc, one for applications using sem_wait() and one for the kernel using nxsem_wait(). The kernel version works well in the PROTECTED and KERNEL builds.

Using sem_wait() in the libc in the FLAT build is wrong most of the time for application calls because it often creates cancellation points where they are forbidden by POSIX. Calling sem_wait() from within in the OS is always wrong since when called from within the kernel.

The fact that sem_wait() works at all in the FLAT build is because of nested cancellation point support.

Calling nxsem_wait() is usually correct in all build modes provided that the libc function is not relying on sem_wait() to provide the cancellation point (which would be bad, very coupled design).

So I think I agree with you. Calling nxsem_wait() from libc always is the cleaner solution if possible (and also fixes the existing problems in regard to using sem_wait()).

@patacongo
Copy link
Contributor

patacongo commented Nov 23, 2023

There is an unresolved probably withe_SEM_XX in the FLAT build most. Since there is only one copy of libc in that configuration and it will use sem_wait() with the cancellation point in all contexts. That is wrong.

sem_wait() also sets the errno value. nxsem_wait() does not. If all calls to _SEM_WAIT are changed to nxsem_wait() we also need to be certain that the calling libc function does not depend on sem_wait() to set the errno value only when called from the application space.

@pussuw
Copy link
Contributor Author

pussuw commented Nov 24, 2023

There is an unresolved probably withe_SEM_XX in the FLAT build most. Since there is only one copy of libc in that configuration and it will use sem_wait() with the cancellation point in all contexts. That is wrong.

sem_wait() also sets the errno value. nxsem_wait() does not. If all calls to _SEM_WAIT are changed to nxsem_wait() we also need to be certain that the calling libc function does not depend on sem_wait() to set the errno value only when called from the application space.

Thanks, the most important reason for me doing this change is to get rid of the _SEM macros, if those cannot be removed the work in this PR is futile and the whole PR should just be closed.

I already changed all _SEM_XX calls from lib_mutex to nxsem_ calls, no errno in use there. I'll change the remaining references and ensure they don't depend on errno either. I don't think this should be the case anyway.

…space

This moves all the public POSIX semaphore functions into libc and with
this most of the user-space logic is also moved; namely cancel point and
errno handling.

This also removes the need for the _SEM_XX macros used to differentiate
which API is used per user-/kernel mode. Such macros are henceforth
unnecessary.
@pussuw
Copy link
Contributor Author

pussuw commented Nov 24, 2023

It is still not clear to me how we have ran LTP tests in the past ?

  • Do people create their own environments where the tests are run ?
  • Do we have something ready made ourselves ? I found testing/ltp in nuttx-apps, what is that ?
  • If so, does CI perform the tests automatically ?

@xiaoxiang781216
Copy link
Contributor

It is still not clear to me how we have ran LTP tests in the past ?

  • Do people create their own environments where the tests are run ?

Yes, Xiaomi run LTP test internally

  • Do we have something ready made ourselves ? I found testing/ltp in nuttx-apps, what is that ?

You can try https://github.com/apache/nuttx/blob/master/boards/sim/sim/sim/configs/posix_test/defconfig
testing/ltp contain the full LTP source code

  • If so, does CI perform the tests automatically ?

You need run the test programs which is huge (1000+) manually now, we plan to enable LTP test in github action in the next couple month.

@pussuw
Copy link
Contributor Author

pussuw commented Nov 24, 2023

It is still not clear to me how we have ran LTP tests in the past ?

  • Do people create their own environments where the tests are run ?

Yes, Xiaomi run LTP test internally

  • Do we have something ready made ourselves ? I found testing/ltp in nuttx-apps, what is that ?

You can try https://github.com/apache/nuttx/blob/master/boards/sim/sim/sim/configs/posix_test/defconfig testing/ltp contain the full LTP source code

  • If so, does CI perform the tests automatically ?

You need run the test programs which is huge (1000+) manually now, we plan to enable LTP test in github action in the next couple month.

Thanks for the reply, so it is not trivial to run those tests right now since I don't have any ready made test framework for this. Manually testing all of the cases is out of the question.

Copy link
Contributor

@PetervdPerk-NXP PetervdPerk-NXP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a bloaty output of the total size change of this PR.

@pussuw
Copy link
Contributor Author

pussuw commented Nov 24, 2023

Can you include a bloaty output of the total size change of this PR.

Yes!

@acassis
Copy link
Contributor

acassis commented Nov 24, 2023

@acassis do we have the LTP testsuite readily integrated, or how do people typically run it ? I have ran some of the test cases manually but that is quite a bit of manual work.

I noticed that nuttx-apps has an application called ltp, and this can be run with the sim:posix_test target. A few questions though:

* Is there a script that runs all of the tests ?

* Are those tests run as part of CI ?

Yes, please take a look at sim/sim/sim/configs/posix_test

@xiaoxiang781216
Copy link
Contributor

should we merge this patch or wait @pussuw 's size report?

@pussuw
Copy link
Contributor Author

pussuw commented Nov 25, 2023

I'll provide the info on Monday @xiaoxiang781216

@acassis
Copy link
Contributor

acassis commented Nov 25, 2023

@xiaoxiang781216 I think it is a good idea to wait! NuttX is increasing very fast and I'm afraid that we cannot fit on MCUs with 32KB Flash anymore!

@PetervdPerk-NXP
Copy link
Contributor

@acassis wouldn't it be great if we can automate a size impact summary in github-actions.
I've seen other projects doing this as well for example prusa3d/Prusa-Firmware#4487 (comment)

@pussuw
Copy link
Contributor Author

pussuw commented Nov 27, 2023

The size difference is almost none, tried with a few targets, pre is before this change, post is after:

rv-virt:nsh64

ville@workpc:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ riscv64-unknown-elf-size nuttx_pre
   text	   data	    bss	    dec	    hex	filename
 175879	    697	  11024	 187600	  2dcd0	nuttx_pre
ville@workpc:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ riscv64-unknown-elf-size nuttx_post
   text	   data	    bss	    dec	    hex	filename
 175811	    697	  11024	 187532	  2dc8c	nuttx_post

sabre-6quad:nsh

ville@workpc:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ arm-none-eabi-size nuttx_pre 
   text	   data	    bss	    dec	    hex	filename
 135798	    356	  24272	 160426	  272aa	nuttx_pre
ville@workpc:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ arm-none-eabi-size nuttx_post
   text	   data	    bss	    dec	    hex	filename
 135750	    356	  24272	 160378	  2727a	nuttx_post
ville@workpc:~/wpc_toolchain/workspace/nuttx-bare/nuttx$

icicle:knsh (kernel mode target with system calls)

ville@workpc:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ riscv64-unknown-elf-size nuttx_pre
   text	   data	    bss	    dec	    hex	filename
1194078	  33652	  27160	1254890	 1325ea	nuttx_pre

ville@workpc:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ riscv64-unknown-elf-size nuttx_post
   text	   data	    bss	    dec	    hex	filename
1193518	  33652	  27160	1254330	 1323ba	nuttx_post

The application image size for kernel mode targets will be a bit larger, since libc is duplicated / statically linked to each binary.

@PetervdPerk-NXP was there some specific target or some specific bloaty flags you wanted me to use. The standard bloaty ouput is basically the same as using size.

@xiaoxiang781216 xiaoxiang781216 merged commit e39ef85 into apache:master Nov 27, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants