Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched/spin_lock: rename raw_spin_* to spin_*_notrace #15705

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

anchao
Copy link
Contributor

@anchao anchao commented Jan 27, 2025

Summary

  1. sched/spin_lock: rename raw_spin_* to spin_*_notrace

raw_spin in the Linux kernel means disable the preemption before
holding the spin lock, not disable trace. Rename raw_spin to
spin_*_notrace to make the interface more semantically consistent.

  1. sched/spinlock: add a new set of interfaces to disable preemption
spin_lock                ->      spin_lock_nopreempt
spin_trylock             ->      spin_trylock_nopreempt
spin_unlock              ->      spin_unlock_nopreempt
spin_lock_irqsave        ->      spin_lock_irqsave_nopreempt
spin_trylock_irqsave     ->      spin_trylock_irqsave_nopreempt
spin_unlock_irqrestore   ->      spin_unlock_irqrestore_nopreempt
  1. sched/spinlock: replace all disable preemption case with spin_*_nopreempt
flags = spin_lock_irqsave(&g_can_lock);       ->   flags = spin_lock_irqsave_nopreempt(&g_can_lock);
sched_lock();                                 ->

spin_unlock_irqrestore(&g_can_lock, flags);   ->
sched_unlock();                               ->   spin_unlock_irqrestore_nopreempt(&g_can_lock, flags);
  1. sched/spinlock: removed the semantics of disable preemption from default spin_lock

In the RTOS environment, spin_lock is simplified as much as possible to improve performance.
At the same time, we provide an set of APIs that disable preemption by default:

spin_lock                ->      spin_lock_nopreempt
spin_trylock             ->      spin_trylock_nopreempt
spin_unlock              ->      spin_unlock_nopreempt
spin_lock_irqsave        ->      spin_lock_irqsave_nopreempt
spin_trylock_irqsave     ->      spin_trylock_irqsave_nopreempt
spin_unlock_irqrestore   ->      spin_unlock_irqrestore_nopreempt

This also has the same intention with RT-Linux.

Signed-off-by: chao an [email protected]

Impact

N/A

Testing

ci-check

@github-actions github-actions bot added Area: Drivers Drivers issues Area: File System File System issues Area: OS Components OS Components issues Size: M The size of the change in this PR is medium labels Jan 27, 2025
@nuttxpr
Copy link

nuttxpr commented Jan 27, 2025

[Experimental Bot, please feedback here]

Yes, this PR appears to meet the NuttX requirements, although it could be improved.

Here's a breakdown of why and suggestions for improvement:

Strengths:

  • Clear Summary: The summary explains the "why" (semantic consistency with Linux), the "what" (renaming and adding spinlock functions), and the "how" (new functions disable preemption).
  • Code Examples: Including the code changes as examples is helpful.
  • Signed-off-by: Indicates contributor agreement to the Developer Certificate of Origin.

Weaknesses & Suggestions:

  • Impact Understated: Marking impact as "N/A" is likely incorrect. Even seemingly minor changes to core functions like spinlocks can have significant impacts. Consider these:

    • Impact on user: While the user might not need to change code directly, this changes the semantics of existing functions. This could lead to subtle bugs if users were relying on the old behavior. At a minimum, the PR should explain why user impact is believed to be minimal.
    • Impact on build: Are any Kconfig options affected? Do any existing users need to change their configurations?
    • Impact on hardware: Does this change affect any specific architectures? If it's generic scheduling code, probably not, but it's worth explicitly stating that.
    • Impact on documentation: The documentation definitely needs updating to reflect the new functions and renamed functions. The PR should mention this and ideally include the documentation updates.
    • Impact on compatibility: Does this break any existing code that used raw_spin_*? If so, how will users migrate their code? If not, explicitly state backward compatibility is maintained.
  • Testing Insufficient: "ci-check" is not sufficient testing information. The requirements ask for specific build hosts, targets, and logs. Provide details on the platforms tested and example logs demonstrating the change's correct behavior. Ideally, include a test case that specifically exercises the new preemption-disabling spinlocks.

In short: While the core information is present, this PR needs more detail in the Impact and Testing sections to fully meet the NuttX requirements. Be explicit about potential impacts even if you believe them to be minimal, and provide concrete testing evidence.

@github-actions github-actions bot added Arch: arm Issues related to ARM (32-bit) architecture Arch: arm64 Issues related to ARM64 (64-bit) architecture Arch: avr Issues related to all AVR(8-bit or 32-bit) architectures Arch: mips Issues related to the MIPS architecture Arch: renesas Issues related to the Renesas chips Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: simulator Issues related to the SIMulator Arch: x86_64 Issues related to the x86_64 architecture Arch: xtensa Issues related to the Xtensa architecture Size: XL The size of the change in this PR is very large. Consider breaking down the PR into smaller pieces. and removed Size: M The size of the change in this PR is medium labels Jan 27, 2025
raw_spin in the Linux kernel means disable the preemption before
holding the spin lock, not disable trace. Rename raw_spin to
spin_*_notrace to make the interface more semantically consistent.

Signed-off-by: chao an <[email protected]>
spin_lock                ->      spin_lock_nopreempt
spin_trylock             ->      spin_trylock_nopreempt
spin_unlock              ->      spin_unlock_nopreempt
spin_lock_irqsave        ->      spin_lock_irqsave_nopreempt
spin_trylock_irqsave     ->      spin_trylock_irqsave_nopreempt
spin_unlock_irqrestore   ->      spin_unlock_irqrestore_nopreempt

Signed-off-by: chao an <[email protected]>
…qsave, since it will be called in func spin_lock_irqsave."

This reverts commit 8f3a2a6.
…is no need to call it explicitly."

This reverts commit b0af946.
…empt

flags = spin_lock_irqsave(&g_can_lock);       ->   flags = spin_lock_irqsave_nopreempt(&g_can_lock);
sched_lock();                                 ->

spin_unlock_irqrestore(&g_can_lock, flags);   ->
sched_unlock();                               ->   spin_unlock_irqrestore_nopreempt(&g_can_lock, flags);

Signed-off-by: chao an <[email protected]>
…ult spin_lock

In the RTOS environment, spin_lock is simplified as much as possible to improve performance.
At the same time, we provide an set of APIs that disable preemption by default:

spin_lock                ->      spin_lock_nopreempt
spin_trylock             ->      spin_trylock_nopreempt
spin_unlock              ->      spin_unlock_nopreempt
spin_lock_irqsave        ->      spin_lock_irqsave_nopreempt
spin_trylock_irqsave     ->      spin_trylock_irqsave_nopreempt
spin_unlock_irqrestore   ->      spin_unlock_irqrestore_nopreempt

This also has the same intention with RT-Linux.

Signed-off-by: chao an <[email protected]>
@github-actions github-actions bot removed the Arch: x86_64 Issues related to the x86_64 architecture label Jan 28, 2025
@acassis
Copy link
Contributor

acassis commented Jan 28, 2025

@anchao nice work! Please update our Documentation/ ( https://nuttx.apache.org/docs/latest/implementation/critical_sections.html ) to include these spinlock details!

@xiaoxiang781216
Copy link
Contributor

  1. sched/spinlock: removed the semantics of disable preemption from default spin_lock

In the RTOS environment, spin_lock is simplified as much as possible to improve performance. At the same time, we provide an set of APIs that disable preemption by default:

spin_lock                ->      spin_lock_nopreempt
spin_trylock             ->      spin_trylock_nopreempt
spin_unlock              ->      spin_unlock_nopreempt
spin_lock_irqsave        ->      spin_lock_irqsave_nopreempt
spin_trylock_irqsave     ->      spin_trylock_irqsave_nopreempt
spin_unlock_irqrestore   ->      spin_unlock_irqrestore_nopreempt

spinlock need hold sched lock in no-RT case.

This also has the same intention with RT-Linux.

No, RT-Linux map spinlock to mutex.

@@ -72,9 +72,9 @@ static FAR struct file *files_fget_by_index(FAR struct filelist *list,
FAR struct file *filep;
irqstate_t flags;

flags = raw_spin_lock_irqsave(&list->fl_lock);
flags = spin_lock_irqsave(&list->fl_lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why revert, but not replace raw_spin_lock_irqsave with spin_lock_irqsave_notrace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here need trace, thank you

@@ -278,8 +278,6 @@ spin_lock_nopreempt(FAR volatile spinlock_t *lock)
#ifdef CONFIG_SPINLOCK
static inline_function void spin_lock(FAR volatile spinlock_t *lock)
{
sched_lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use spin_lock_nopreempt() , this PR just restore the code behavior to the same as before

@xiaoxiang781216
Copy link
Contributor

https://docs.kernel.org/locking/locktypes.html contain the detailed information, here is the keypoint:
image
image

@anchao
Copy link
Contributor Author

anchao commented Jan 29, 2025

  1. sched/spinlock: removed the semantics of disable preemption from default spin_lock

In the RTOS environment, spin_lock is simplified as much as possible to improve performance. At the same time, we provide an set of APIs that disable preemption by default:

spin_lock                ->      spin_lock_nopreempt
spin_trylock             ->      spin_trylock_nopreempt
spin_unlock              ->      spin_unlock_nopreempt
spin_lock_irqsave        ->      spin_lock_irqsave_nopreempt
spin_trylock_irqsave     ->      spin_trylock_irqsave_nopreempt
spin_unlock_irqrestore   ->      spin_unlock_irqrestore_nopreempt

spinlock need hold sched lock in no-RT case.

why? If no scheduling occurs in the spin_lock protected code, why do we need to hold the sched_lock?
In addition, I provide the spin_lock_nopreempt() version, you should use this one not spin_lock()

This also has the same intention with RT-Linux.

No, RT-Linux map spinlock to mutex.

After merge this patch, you can make changes from spin_lock to mutex. This PR just removes sched_lock to improve performance.

@anchao
Copy link
Contributor Author

anchao commented Jan 29, 2025

https://docs.kernel.org/locking/locktypes.html contain the detailed information, here is the keypoint: image image

  1. so did you notice that sched_lock() was removed from PREEMPT_RT? This is the beginning of RT, because nuttx did not have sched_lock() in spin_lock() involved before, why does your change make the entire system bear the overhead of sched_lock()?
  2. Replacing mutex with spin_lock is the second stage of optimization, which does not conflict with the current PR
  3. You can look at the spin_lock implementation in other RTOS, no one will disable preemption(zephyr)
    https://github.com/zephyrproject-rtos/zephyr/blob/c66a0bb2a9269dc483079d5cc24ba77b8c332a9a/include/zephyr/spinlock.h#L182-L214

Now we have 4 APIs available to developers

1.spin_lock
2. spin_lock_nopreempt
3. spin_lock_irqsave
4. spin_lock_irqsave_nopreempt

If you think the code needs to hold sched_lock, please use
*spin_lock_nopreempt instead of spin_lock()

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jan 29, 2025

https://docs.kernel.org/locking/locktypes.html contain the detailed information, here is the keypoint: image image

  1. so did you notice that sched_lock() was removed from PREEMPT_RT? This is the beginning of RT, because nuttx did not have sched_lock() in spin_lock() involved before, why does your change make the entire system bear the overhead of sched_lock()?

Yes, but you need first change the exuction of ALL IRQ from the interupt context to the thread context. Before that happen, the sched lock must be taken at the same time we hold spinlock.

  1. Replacing mutex with spin_lock is the second stage of optimization, which does not conflict with the current PR

Let's emphasis the key point:

  1. The busy loop lock(e.g. spinlock) must hold sched lock too
  2. The sleep lock(e.g. mutex) don't/shouldn't hold sched lock

So, you can't do the change in seperate step without breaking the code:

  1. Remove sched lock first
  2. Change spinlock to mutex at the late time

Let's consider a VERY COMMON case in SMP:

  1. thread0 on CPU0 hold a spinlock without hold sched lock
  2. thread1 on CPU1 wake up thread2
  3. OS decide suspend thread0 and resume thread2 on CPU0
  4. thread1 on CPU1 want to hold the spinlock hold by thread0

but thread1 is ver hard to know when it can finish the busy wait loop since thread0 is already in the suspend state.
so, it's VERY VERY dangerous to schedule a thread which hold a spinlock.

  1. You can look at the spin_lock implementation in other RTOS, no one will disable preemption(zephyr)
    https://github.com/zephyrproject-rtos/zephyr/blob/c66a0bb2a9269dc483079d5cc24ba77b8c332a9a/include/zephyr/spinlock.h#L182-L214

zephyr's k_spin_lock disable irq too, but nuttx's spin_lock doesn't disable irq.

Now we have 4 APIs available to developers

1.spin_lock 2. spin_lock_nopreempt 3. spin_lock_irqsave 4. spin_lock_irqsave_nopreempt

If you think the code needs to hold sched_lock, please use *spin_lock__nopreempt_ instead of spin_lock()

No, the correct semantics(same as Linux) is:

  1. spin_lock
  2. spin_lock_preempt
  3. spin_lock_irqsave
  4. spin_lock_irqsave_preempt

Otherwise, the most code will sufer the random deadlock or long time busy loop like the example I mention previously. Only the special designed code can use spin_lock_preempt/spin_lock_irqsave_preempt as expect.
The bottom line is that the most common used API should be SMP SAFE.

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

Yes, but you need first change the exuction of ALL IRQ from the interupt context to the thread context. Before that happen, the sched lock must be taken at the same time we hold spinlock.

Let's emphasis the key point:

  1. The busy loop lock(e.g. spinlock) must hold sched lock too
  2. The sleep lock(e.g. mutex) don't/shouldn't hold sched lock

So, you can't do the change in seperate step without breaking the code:

  1. Remove sched lock first
  2. Change spinlock to mutex at the late time

Let's consider a VERY COMMON case in SMP:

  1. thread0 on CPU0 hold a spinlock without hold sched lock
  2. thread1 on CPU1 wake up thread2
  3. OS decide suspend thread0 and resume thread2 on CPU0
  4. thread1 on CPU1 want to hold the spinlock hold by thread0

but thread1 is ver hard to know when it can finish the busy wait loop since thread0 is already in the suspend state. so, it's VERY VERY dangerous to schedule a thread which hold a spinlock.

Let’s first unify our understanding:

  1. There was no deadlock problem before in nuttx's API semantics, and all codes works very well, regardless of SMP or AMP

Right?

  1. The deadlock problem was caused by xiaomi's changes(from you @xiaoxiang781216 and @hujun260 )
  2. You first replaced enter_critical_section() with spin_lock_irqsave(),
  3. After finding the deadlock, you guys replace spin_lock_irqsave() with spin_lock_irqsave()+sched_lock()
  4. In order to simplify the code, you inside sched_lock() into spin_lock_irqsave(). Now you guys find that such a change will break all ESP devices and the performance of the entire system will be degraded
  5. Look at what you are doing now,
    Prepare to replace all kernel codes from spin_lock_irqsave() to raw_spin_lock_irqsave()
    replace spin_lock_irqrestore with raw_spin_lock_irqrestore #15695
    In order to solve the problem of startup crash, replace all spin_lock_irqsave() with raw_spin_lock_irqsave()
    arch: use raw_spin_[un]lock to replace spin_[un]lock, fix regression b69111d16a2a330fa272af8175c832e08881844b of https://github.com/apache/nuttx/pull/14578 #15691
    esp32: replace spin_lock_irqsave with raw_spin_lock_irqsave. espressif/esp-hal-3rdparty#8

I can't imagine how many board-level and chip codes will need similar adjustments in the future.
How can other developers continue to trust NuttX when the kernel API is so unstable?

Most kernel code does not need to hold sched_lock(), which is unfair to some code that does not cause scheduling. New API semantics should be expressed with new APIs instead of modifying the behavior of the original API.

  1. You can look at the spin_lock implementation in other RTOS, no one will disable preemption(zephyr)
    https://github.com/zephyrproject-rtos/zephyr/blob/c66a0bb2a9269dc483079d5cc24ba77b8c332a9a/include/zephyr/spinlock.h#L182-L214

zephyr's k_spin_lock disable irq too, but nuttx's spin_lock doesn't disable irq.

let remove spin_lock_irqsave() and move the irqsave to spin_lock() ok?

Now we have 4 APIs available to developers
1.spin_lock 2. spin_lock_nopreempt 3. spin_lock_irqsave 4. spin_lock_irqsave_nopreempt
If you think the code needs to hold sched_lock, please use *spin_lock__nopreempt_ instead of spin_lock()

No, the correct semantics(same as Linux) is:

  1. spin_lock
  2. spin_lock_preempt
  3. spin_lock_irqsave
  4. spin_lock_irqsave_preempt

Otherwise, the most code will sufer the random deadlock or long time busy loop like the example I mention previously. Only the special designed code can use spin_lock_preempt/spin_lock_irqsave_preempt as expect. The bottom line is that the most common used API should be SMP SAFE.

No, keep the current API semantics unchanged, and use new APIs to represent new features

@xiaoxiang781216
Copy link
Contributor

Yes, but you need first change the exuction of ALL IRQ from the interupt context to the thread context. Before that happen, the sched lock must be taken at the same time we hold spinlock.

Let's emphasis the key point:

  1. The busy loop lock(e.g. spinlock) must hold sched lock too
  2. The sleep lock(e.g. mutex) don't/shouldn't hold sched lock

So, you can't do the change in seperate step without breaking the code:

  1. Remove sched lock first
  2. Change spinlock to mutex at the late time

Let's consider a VERY COMMON case in SMP:

  1. thread0 on CPU0 hold a spinlock without hold sched lock
  2. thread1 on CPU1 wake up thread2
  3. OS decide suspend thread0 and resume thread2 on CPU0
  4. thread1 on CPU1 want to hold the spinlock hold by thread0

but thread1 is ver hard to know when it can finish the busy wait loop since thread0 is already in the suspend state. so, it's VERY VERY dangerous to schedule a thread which hold a spinlock.

Let’s first unify our understanding:

  1. There was no deadlock problem before in nuttx's API semantics, and all codes works very well, regardless of SMP or AMP

Right?

No, it isn't truth. the critical section is a big lock which is a well known SMP isssue and should be replaced by the fine grant lock step by step:
https://en.wikipedia.org/wiki/Giant_lock
More and more IoT device has SMP capability, so it's important to improve NuttX's SMP performance.

  1. The deadlock problem was caused by xiaomi's changes(from you @xiaoxiang781216 and @hujun260 )

All known issues are identified and fixed timely(ALL within 24hours), do you have any new issue?

  1. You first replaced enter_critical_section() with spin_lock_irqsave(),

Let's emphasize that the critical section isn't equal to a spin lock since the thread will leave the critical seciton if the thread is suspended by shceduler, but the spinlock doesn't have this capability that's why many places replace enter_critical_section with spin_lock_irqsave + sched_lock.

  1. After finding the deadlock, you guys replace spin_lock_irqsave() with spin_lock_irqsave()+sched_lock()

Where do you find we patch the sched_lock seperatedly? we change to spin_lock_irqsave+sched_lock at the same time.

  1. In order to simplify the code, you inside sched_lock() into spin_lock_irqsave().

No, #14578 was created three months ago to fix #9531 and the problem I mentioned before.
But since it take a long time to review this important change, we have to use spin_lock_irqsave+sched_lock before it get merged.

Now you guys find that such a change will break all ESP devices and the performance of the entire system will be degraded

that's why spin_lock and raw_spin_lock exist:

  1. In normal case, driver should use spin_lock which disable the schedule at the same time
  2. In special case, you can call raw_spin_lock to bypass locking schedule

But the default version should be the safe one.

  1. Look at what you are doing now,
    Prepare to replace all kernel codes from spin_lock_irqsave() to raw_spin_lock_irqsave()
    replace spin_lock_irqrestore with raw_spin_lock_irqrestore #15695

Yes, since this place is identified carefully that the lock scheduler sin't really needed.

In order to solve the problem of startup crash, replace all spin_lock_irqsave() with raw_spin_lock_irqsave()

This is the wrong usage of spinlock which hold the lock in one thread and unlock in another thread.

arch: use raw_spin_[un]lock to replace spin_[un]lock, fix regression b69111d16a2a330fa272af8175c832e08881844b of https://github.com/apache/nuttx/pull/14578 #15691
esp32: replace spin_lock_irqsave with raw_spin_lock_irqsave. espressif/esp-hal-3rdparty#8

I can't imagine how many board-level and chip codes will need similar adjustments in the future. How can other developers continue to trust NuttX when the kernel API is so unstable?

@hujun260 could we use a new approach to avoid touching esp32 code?

Most kernel code does not need to hold sched_lock(), which is unfair to some code that does not cause scheduling. New API semantics should be expressed with new APIs instead of modifying the behavior of the original API.

Yes, that's why we provide raw_spin__lock or spin_lock_preempt, but the default one(spin_lock) need to hold sched_lock for safety.

  1. You can look at the spin_lock implementation in other RTOS, no one will disable preemption(zephyr)
    https://github.com/zephyrproject-rtos/zephyr/blob/c66a0bb2a9269dc483079d5cc24ba77b8c332a9a/include/zephyr/spinlock.h#L182-L214

zephyr's k_spin_lock disable irq too, but nuttx's spin_lock doesn't disable irq.

let remove spin_lock_irqsave() and move the irqsave to spin_lock() ok?

please look at the dissusion: #9531

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

No, it isn't truth. the critical section is a big lock which is a well known SMP isssue and should be replaced by the fine grant lock step by step:
https://en.wikipedia.org/wiki/Giant_lock
More and more IoT device has SMP capability, so it's important to improve NuttX's SMP performance.

I agree with the changes to the critical section, but the current API changes are beyond my understanding and we are heading down the wrong path.

All known issues are identified and fixed timely(ALL within 24hours), do you have any new issue?

You have too many mistakes in the entire change cycle. This is just the one point of my review. There are many more, so I won’t share too much.

#15232 (comment)

Let's emphasize that the critical section isn't equal to a spin lock since the thread will leave the critical seciton if the thread is suspended by shceduler, but the spinlock doesn't have this capability that's why many places replace enter_critical_section with spin_lock_irqsave + sched_lock.
Where do you find we patch the sched_lock seperatedly? we change to spin_lock_irqsave+sched_lock at the same time.
No, #14578 was created three months ago to fix #9531 and the problem I mentioned before.
But since it take a long time to review this important change, we have to use spin_lock_irqsave+sched_lock before it get merged.

Let's refrain from unfounded remarks. You intended to replace wo_trace with the raw_ version to avoid the implementation of trace. I pointed out the issues in this approach within PR14943. Nevertheless, you were rather determined to merge the raw_ implementation.

#14943 (comment)

Yes, since this place is identified carefully that the lock scheduler sin't really needed.

No, let us keep the semantics of spin_lock_irqsave and not extend sched_lock and not follow linux which is an old implementation

This is the wrong usage of spinlock which hold the lock in one thread and unlock in another thread.

who wrong?It is very irresponsible of you to say this. There is nothing wrong with ESP using spin_lock_irqsave. It is your changes that caused the regression. Why do you say it is someone else's fault? It is unreasonable.

@hujun260 could we use a new approach to avoid touching esp32 code?

stop give the new API, just keep original API semantics

please look at the dissusion: #9531

The naming of the API can show the implementation content, which is a perfect naming method.

spin_lock_irqsave contains two functions:

1. spin_lock

2. irqsave

Why do you have to make spin_lock_irqsave contain three functions?

1. spin_lock

2. irqsave

3. sched_lock <--- why?

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

The current API definition is perfect, and the function name can identify the implementation details of each function.

spin_lock: spin lock
spin_lock_nopreempt: spin_lock + sched_lock
spin_lock_irqsave: spin lock + irq save
spin_lock_irqsave_nopreempt: spin_lock + irq save + sched_lock

And your expected API

spin_lock: spin lock + sched_lock <--- why?
spin_lock_preempt: spin lock <--- why?

nuttx is not Linux, please do not introduce the old implementation in Linux into Nuttx

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jan 30, 2025

The current API definition is perfect, and the function name can identify the implementation details of each function.

spin_lock: spin lock
spin_lock_nopreempt: spin_lock + sched_lock
spin_lock_irqsave: spin lock + irq save
spin_lock_irqsave_nopreempt: spin_lock + irq save + sched_lock

And your expected API

spin_lock: spin lock + sched_lock <--- why?

so, you don't read my reply carefully, let me copy the example again:

  1. thread0 on CPU0 hold a spinlock without hold sched lock
  2. thread1 on CPU1 wake up thread2
  3. OS decide suspend thread0 and resume thread2 on CPU0
  4. thread1 on CPU1 want to hold the spinlock hold by thread0

but thread1 is ver hard to know when it can finish the busy wait loop since thread0 is already in the suspend state.
so, it's VERY VERY dangerous to schedule a thread which hold a spinlock.

and the issue @patacongo mention before: #9531 (comment)
spin lock can't use standalone without sched lock.

spin_lock_preempt: spin lock <--- why?

same it is unsafe, should be used with the caution.

nuttx is not Linux, please do not introduce the old implementation in Linux into Nuttx

This behavior isn't Linux specific, ALL OS support spin_lock must disable schedule too, otherwise the deadlock will happen naturally. Please reread the above case again.

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

so, you don't read my reply carefully, let me copy the example again:

  1. thread0 on CPU0 hold a spinlock without hold sched lock
  2. thread1 on CPU1 wake up thread2
  3. OS decide suspend thread0 and resume thread2 on CPU0
  4. thread1 on CPU1 want to hold the spinlock hold by thread0

but thread1 is ver hard to know when it can finish the busy wait loop since thread0 is already in the suspend state. so, it's VERY VERY dangerous to schedule a thread which hold a spinlock.

and the issue @patacongo mention before: #9531 (comment) spin lock can't use standalone without sched lock.

You should use spin_lock_nopreempt() instead of spin_lock()
I have answered this question many times, I will not answer this question again!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
You should use spin_lock_nopreempt() instead of spin_lock()
I have answered this question many times, I will not answer this question again!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

spin_lock_preempt: spin lock <--- why?

same it is unsafe, should be used with the caution.

nuttx is not Linux, please do not introduce the old implementation in Linux into Nuttx

This behavior isn't Linux specific, ALL OS support spin_lock must disable schedule too, otherwise the deadlock will happen naturally. Please reread the above case again.

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

if your code does not enable the interrupt controller and schedler at boot time, why does spin_lock require sched_lock and irq_save?

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

I hope you read my reply carefully. Which of the following 2 options do you pick?

spin_lock: spin lock
spin_lock_nopreempt: spin_lock + sched_lock
spin_lock_irqsave: spin lock + irq save
spin_lock_irqsave_nopreempt: spin_lock + irq save + sched_lock

or

spin_lock: spin lock + sched_lock
spin_lock_preempt: spin_lock
spin_lock_irqsave: spin lock + irq save + sched_lock
spin_lock_irqsave_preempt: spin_lock + irq save

@xiaoxiang781216
Copy link
Contributor

so, you don't read my reply carefully, let me copy the example again:

  1. thread0 on CPU0 hold a spinlock without hold sched lock
  2. thread1 on CPU1 wake up thread2
  3. OS decide suspend thread0 and resume thread2 on CPU0
  4. thread1 on CPU1 want to hold the spinlock hold by thread0

but thread1 is ver hard to know when it can finish the busy wait loop since thread0 is already in the suspend state. so, it's VERY VERY dangerous to schedule a thread which hold a spinlock.
and the issue @patacongo mention before: #9531 (comment) spin lock can't use standalone without sched lock.

You should use spin_lock_nopreempt() instead of spin_lock() I have answered this question many times, I will not answer this question again!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! You should use spin_lock_nopreempt() instead of spin_lock() I have answered this question many times, I will not answer this question again!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Ok, could you give me an example, when can you use spin_lock API? the interrupt can be happened without your control, which mean spin_lock CAN'T BE CALLED in any thread context.

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

The advantage of the first option is that all code does not need to be changed, including the ESP third-party library, and the historical API behavior is the same as before.

For the second option, 90% of the places in the kernel that use spin_lock_irqsave need to be replaced with spin_lock_irqsave_preempt, and all implementations that are not in the nuttx repo also need to be re-modified

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

so, you don't read my reply carefully, let me copy the example again:

  1. thread0 on CPU0 hold a spinlock without hold sched lock
  2. thread1 on CPU1 wake up thread2
  3. OS decide suspend thread0 and resume thread2 on CPU0
  4. thread1 on CPU1 want to hold the spinlock hold by thread0

but thread1 is ver hard to know when it can finish the busy wait loop since thread0 is already in the suspend state. so, it's VERY VERY dangerous to schedule a thread which hold a spinlock.
and the issue @patacongo mention before: #9531 (comment) spin lock can't use standalone without sched lock.

You should use spin_lock_nopreempt() instead of spin_lock() I have answered this question many times, I will not answer this question again!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! You should use spin_lock_nopreempt() instead of spin_lock() I have answered this question many times, I will not answer this question again!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Ok, could you give me an example, when can you use spin_lock API? the interrupt can be happened without your control, which mean spin_lock CAN'T BE CALLED in any thread context.

if your code does not enable the interrupt controller and schedler at boot time, why does spin_lock require sched_lock and irq_save?

bringup phase and ISR and scheduler is not enabled

@anchao
Copy link
Contributor Author

anchao commented Jan 30, 2025

Ok, could you give me an example, when can you use spin_lock API? the interrupt can be happened without your control, which mean spin_lock CAN'T BE CALLED in any thread context.

There are also some places where more fine-grained control can be used, such as

irq_save

...
if (....)
spin_lock

if (spin_is_locked)
spin_unlock

irq_restore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: arm Issues related to ARM (32-bit) architecture Arch: arm64 Issues related to ARM64 (64-bit) architecture Arch: avr Issues related to all AVR(8-bit or 32-bit) architectures Arch: mips Issues related to the MIPS architecture Arch: renesas Issues related to the Renesas chips Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Arch: simulator Issues related to the SIMulator Arch: xtensa Issues related to the Xtensa architecture Area: Drivers Drivers issues Area: File System File System issues Area: OS Components OS Components issues Size: XL The size of the change in this PR is very large. Consider breaking down the PR into smaller pieces.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants