Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various errors for simultaneous fork/thread creation #155

Open
tilsche opened this issue Jun 10, 2020 · 1 comment
Open

Various errors for simultaneous fork/thread creation #155

tilsche opened this issue Jun 10, 2020 · 1 comment
Labels

Comments

@tilsche
Copy link
Member

tilsche commented Jun 10, 2020

This innocent cute little bunny program brutally murders lo2s even with the new thread safety in place.

constexpr int children = 4;
constexpr int generations = 4;
constexpr int threads = 6;

#include <chrono>
#include <thread>
#include <vector>

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>

using myclock = std::chrono::high_resolution_clock;

myclock::time_point begin;

void work()
{
    while (myclock::now() < begin + std::chrono::seconds(5));
    std::this_thread::sleep_until(begin + std::chrono::seconds(10));
    while (myclock::now() < begin + std::chrono::seconds(15));
}

int main() {
    int generation = 0;

    begin = myclock::now();

    for (int i = 0; i  < children; i++)
    {
        pid_t pid = fork();
        if (pid == 0) // child
        {
            if (generation < generations)
            {
                generation++;
                i = -1;
                continue; // more forking
            }
            break;
        }
        // parent
    }

    if (threads > 0)
    {
        std::vector<std::thread> tv;
        for (int t = 0; t < threads; t++)
        {
            tv.emplace_back(work);
        }
        for (auto& t : tv)
        {
            t.join();
        }
    }
    else {
        work();
    }
}

Run with ulimit -n 524288

First hit (many instances of):

[14641106768213][pid: 64987][tid: 64987][ WARN]: Could not find system tree node for pid 69797

Second hit (also many instances):

[14641047009139][pid: 64987][tid: 64987][ERROR]: Failed to get process containing monitored thread 69740

Third hit (some of those later, repeatedly with the same pid):

[14647409082867][pid: 64987][tid: 64987][ WARN]: Thread 71270 is about to exit, but was never seen before.

And KO (not sure if all of those are related, there is a temporal gap):

[14647411589098][pid: 64987][tid: 64987][ERROR]: perf_event_open for sampling failed
[14647411596838][pid: 64987][tid: 64987][ERROR]: maybe the specified clock is unavailable?
[14647411630168][pid: 64987][tid: 64987][ERROR]: Failure while adding new thread cloned from 71260: No such process
[14650603407932][pid: 64987][tid: 64987][FATAL]: Aborting: No such process
@bmario bmario added the bug label Jun 12, 2020
@cvonelm
Copy link
Member

cvonelm commented Jul 16, 2020

This seems to be generally caused by us missing PTRACE_EVENT_SOMETHING for the different kinds of creating offspring, for load reasons (my computer reached 500 something load during testing)

[14641106768213][pid: 64987][tid: 64987][ WARN]: Could not find system tree node for pid 69797

We've missed the PTRACE_EVENT_FORK for the parent so there is no system tree node parent. Harmless, and is already recovered by just using system_tree_root_node as the parent instead

Failed to get process containing monitored thread 697401

We've missed the PTRACE_EVENT_SOMETHING for the parent, so that there is no entry for the parent in the tid_to_pid mapping table. This is currently not recovered at all, and prevents the thread whose parent we've missed from being monitored. More optimally we would just make its parent NO_PARENT_PROCESS_PID or something. But this might be hairy, because I don't know how much info we need from the parent process

Thread 71270 is about to exit, but was never seen before

Same deal, we just never got a PTRACE_EVENT_SOMETHING

Aborting: No such process

perf_event_open returned ESRCH, which means that the thread we wanted to sample died before we could start sampling it. This is very obviously not recovered at all currently but should be very well possible to recover from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants