Consistent state across forks #630

davidchisnall · 2023-08-30T16:26:33Z

I am increasingly unconvinced that snmalloc is safe to use in programs that call fork. I had originally thought that the mostly lockless design meant that we're safe, but I think there are three problems:

If a thread holds any of the range locks, we can deadlock on fork.
Any allocators owned by threads other than the one calling fork will leak.
Any allocators owned by threads other than the one calling fork will be in an inconsistent state.

This is all bad.

The first problem can be solved by acquiring the lock(s?) in the prepare callback for pthread_atfork and then releasing it in both the parent and child callbacks.

The second can be solved by putting all allocated allocators in a linked list and walking that list in the child to return them all to the pool (and probably coalescing them at that point).

I'm not sure about the third. Do we atomically pop things off free lists on the fast path? On anything that's doing read-modify-write operations on an allocator, we'd need an asymmetric barrier.

I'm not especially worried about this because any program that calls malloc from a multithreaded program that calls fork and not execve is going to leak memory, but ideally malloc between fork and execve should work (which requires fixing the first one).

The text was updated successfully, but these errors were encountered:

mjp41 · 2023-08-30T18:06:55Z

Three other places I can think of in addition

the allocator pool has a lock as well. So needs fixing in the same way.
the singleton pattern uses a lock too.
message queue is non-linearisable. This means you can begin an enqueue, another thread forks, and then the message queue will be in a state where it will never become reconnected to the back. The child post fork handler should process the message queue and reconnect the back if required.

mjp41 · 2025-01-23T17:21:45Z

So I finally got around to thinking about this. I had an idea for a C++ class to encapsulate the logic for preventing deadlocks, but not leaks:

// This is a simple implementation of a class that can be
// used to prevent a process from forking. Holding a lock
// in the allocator while forking can lead to deadlocks.
// This causes the fork to wait out any other threads inside
// the allocators locks.
//
// The use is
// ```
//  {
//     PreventFork pf;
//     // Code that should not be running during a fork.
//  }
// ```
class PreventFork
{
  // Global atomic counter of twice the number of threads currently preventing the system from forking.
  // The bottom bit is used to signal that a thread is wanting to fork.
  static std::atomic<size_t> threads_preventing_fork{0}

  // The depth of the current thread's prevention of forking.
  // This is used to enable reentrant prevention of forking.
  static thread_local size_t depth_of_prevention{0};

  // The function that notifies new threads not to enter PreventFork regions
  // It waits until all threads are no longer in a PreventFork region before returning.
  static void prefork()
  {
    while (true)
    {
      auto current = threads_preventing_fork.load();
      if ((current % 2 == 0) && (threads_preventing_fork.compare_exchange_weak(current, current + 1)))
      {
        break;
      }
      Aal::pause();
    };

    while(threads_preventing_fork.load() != 1)
    {
      Aal::pause();
    }
  }

  // Unsets the flag that allows threads to enter PreventFork regions
  // and for another thread to request a fork.
  static void postfork()
  {
    threads_preventing_fork = 0;
  }

public:
  PreventFork()
  {
    if (depth_of_prevention++ == 0)
    {
      while (true)
      {
        auto current = threads_preventing_fork.load();

        if ((current % 2 == 0) && threads_preventing_fork.compare_exchange_weak(current, current + 2))
        {
          break;
        }
        Aal::pause();
      };
    }
  }

  ~PreventFork()
  {
    if (--depth_of_prevention == 0)
    {
      threads_preventing_fork -= 2;
    }
  }

  static void init()
  {
    pthread_atfork(prefork, postfork, postfork);
  }
};

@davidchisnall @nwf @SchrodingerZhu: I'll probably work on this in the next week or two, but if you have any comments about this it'd be great to hear them.

Also thoughts on testing this, which seems extremely hard to get coverage.

P.S. I haven't even compiled this yet, so might have obvious bugs.

mjp41 mentioned this issue Jan 24, 2025

Protection against fork #735

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent state across forks #630

Consistent state across forks #630

davidchisnall commented Aug 30, 2023

mjp41 commented Aug 30, 2023 •

edited

Loading

mjp41 commented Jan 23, 2025 •

edited

Loading

Consistent state across forks #630

Consistent state across forks #630

Comments

davidchisnall commented Aug 30, 2023

mjp41 commented Aug 30, 2023 • edited Loading

mjp41 commented Jan 23, 2025 • edited Loading

mjp41 commented Aug 30, 2023 •

edited

Loading

mjp41 commented Jan 23, 2025 •

edited

Loading