CORE-8618 crash_tracker: record uncaught startup exceptions #24854

pgellert · 2025-01-17T15:47:55Z

This PR adds functionality to record crashes caused by an exception during startup as crash reports to disk. This is the first step to implement the core crash recording project which aims to persist information about redpanda broker crashes to disk to help debugging.

The PR implements the core functionality of preparing state to allow writing crash reports in an async-signal-safe way, so that this code can be reused in signal handlers in follow-up PRs.

The crash reports are going to be stored under the data directory. The filenames are constructed as: ${datadir}/crash_reports/${broker_start_time_ms}_${random}.crash.

Reports are serialized using serde and iobuf (with pre-reserved) memory, and the implementation relies on their allocation-free behaviour to ensure that writing crash files in signal handlers is reliable even in OOM scenarios. (This will become important in a follow-up PR when the signal handlers are actually hooked up.)

To write to the crash reports, we need to use the low-level open() function here instead of the seastar API or higher-level C++ primitives because we need to be able to manipulate the file using async-signal-safe, allocation-free functions inside signal handlers. Ref: https://man7.org/linux/man-pages/man7/signal-safety.7.html

Fixes https://redpandadata.atlassian.net/browse/CORE-8618

Backports Required

Release Notes

none

vbotbuildovich · 2025-01-17T19:55:08Z

CI test results

test results on build#60910

test_id	test_kind	job_url	test_status	passed
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli	ducktape	https://buildkite.com/redpanda/redpanda/builds/60910#01947552-5d4b-4150-bcdb-b7b1bb0fd156	FLAKY	1/2

test results on build#60955

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/60955#01948420-a364-4301-adf9-f2249f270189	FLAKY	1/2
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3	ducktape	https://buildkite.com/redpanda/redpanda/builds/60955#01948420-a364-4464-a871-097ffe11e6c2	FLAKY	1/2

test results on build#61003

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61003#01948a54-d3a4-4eaf-b3e8-9ebafd176099	FLAKY	1/2

pgellert · 2025-01-20T13:17:25Z

force-push: rebase to pgellert:crashlog/define-api to update the first 4 commits of this stacked PR with the latest changes

dotnwat

this is looking pretty cool!

dotnwat · 2025-01-20T17:36:48Z

src/v/config/node_config.h

@@ -122,6 +122,10 @@ struct node_config final : public config_store {
        return data_directory().path / "startup_log";
    }

+    std::filesystem::path crash_report_dir_path() const {
+        return data_directory().path / "crash_reports";


how big are the crash reports? if they are too big, we may want to have space management track them.

~10KB, and we're going to limit to storing up to 50 of them, so their total size should be <1MB. We decided not to bother including them in space management or log listing queries because their space usage will be insignificant.

src/v/crash_tracker/prepared_writer.cc

dotnwat · 2025-01-20T17:40:14Z

src/v/crash_tracker/prepared_writer.cc

+    _prepared_cd.crash_message.resize(4096, '\0');
+    _prepared_cd.stacktrace.resize(4096, '\0');
+    _prepared_cd.addition_info.resize(4094, '\0');


\0 is the default value to fill with using resize(size)

dotnwat · 2025-01-20T17:45:01Z

src/v/crash_tracker/prepared_writer.cc

+    // Create the crash recorder file
+    auto f = co_await ss::open_file_dma(
+      _crash_report_file_name.c_str(),
+      ss::open_flags::create | ss::open_flags::rw);


truncate
exclusive

dotnwat · 2025-01-20T17:46:19Z

src/v/crash_tracker/prepared_writer.cc

+      _crash_report_file_name.c_str(),
+      ss::open_flags::create | ss::open_flags::rw);
+    co_await f.close();
+    co_await ss::sync_directory(_crash_report_file_name.parent_path().string());


are you protecting against a certain scenario by syncing the directory here? a comment would probably be helpful since it's likely important?

I've added this comment:

Sync the parent dir to ensure that the new file is observable to the ::open() call below and to later restarts of the process

Let me know if that explains it well enough.

dotnwat · 2025-01-20T17:52:35Z

src/v/crash_tracker/prepared_writer.cc

+    for (const auto& frag : _serde_output) {
+        auto written = ::write(_fd, frag.get(), frag.size());
+        if (written == -1) {
+            return false;


I wouldn't expect us to be able to continue to write to the file descriptor if an earlier ::write() call failed. Am I missing something here? But I also don't see any problems with continuing to try to write (the man page doesn't say that this would be a problem), so I've changed the code to continue to try to write here.

dotnwat · 2025-01-20T17:53:29Z

src/v/crash_tracker/prepared_writer.cc

+        co_await ss::sync_directory(
+          _crash_report_file_name.parent_path().string());


sync is probably unnecessary, but maybe you could provide a little context on what you're trying to protect against?

This is the comment I've added:

Sync the parent dir to ensure that later restarts of the process cannot observe this file

Let me know if it explains it well enough.

dotnwat · 2025-01-20T17:56:10Z

src/v/crash_tracker/recorder.cc

+                          "{}_{}{}", time_now, random_int, crash_report_suffix);
+        if (co_await ss::file_exists(try_name.string())) {
+            // Try again in the rare case of a collision
+            co_await ss::sleep(1ms);


what's the sleep for? you're not going to try again with the same filename, right?

Yeah this is not needed. I've removed it now

dotnwat · 2025-01-20T18:04:26Z

src/v/crash_tracker/recorder.cc

+    std::lock_guard<ss::util::spinlock> g(_writer_lock);
+    co_await _writer.release();


can you comment on the use of a spinlock here compared to a blocking mutex? generally i would think that holding a spinlock for anything other than the tiniest of cpu-bound critical sections (and release here is performing I/O) would not perform as well as a blocking mutex where the kernel can put a blocking thread to sleep and understand the chain of dependencies.

I've added this comment now:

The writer has shared state, so accessing it from multiple threads
is guarded by this lock. We are using the async-signal safe
ss::util::spinlock instead of more efficient locking mechanisms like the
seastar primitives or std::mutex to ensure that we can safely use the
lock in a singal handler context.

This lock will be used in signal handlers in a follow-up PR so it needs to be async-signal safe. The pthread mutex isn't: https://man7.org/linux/man-pages/man3/pthread_mutex_lock.3.html.
The standard seastar concurrency primitives are also not available in the signal handler because the signal handler runs outside of the reactor. ss::util::spinlock is specifically an async-signal safe spinlock that seastar uses in its signal handlers: https://github.com/scylladb/seastar/blob/c3f33c116ecde6f37cea6f656e1dfed2a4cd26a2/include/seastar/util/spinlock.hh#L85-L88

I am not worried about the performance overhead of the spinlock because all of the code paths where the lock is used are outside of performance-critical sections (startup, shutdown, mid-crash), and I don't expect the lock to be contended except for some highly unlikely scenarios (eg. segfault while already recording a failure).

dotnwat · 2025-01-20T18:06:12Z

src/v/crash_tracker/recorder.cc

+    // If the recorder is already locked, give up to prevent possible deadlocks.
+    // For example, an assertion failure while recording a segfault may lead to
+    // a deadlock if we unconditionally waited for the lock here.
+    std::unique_lock<ss::util::spinlock> g(_writer_lock, std::try_to_lock);


same here. why a spinlock?

Answered here: #24854 (comment)

src/v/crash_tracker/prepared_writer.cc

michael-redpanda · 2025-01-21T13:46:33Z

src/v/crash_tracker/prepared_writer.cc

+    _prepared_cd.addition_info.resize(4094, '\0');
+
+    // Overestimate the size of a serialized _prepared_cd
+    const size_t serde_reserve = 1024 + 3 * 4096;


nit: magic 1024 number

I've now added a constant to give this a name

src/v/crash_tracker/prepared_writer.cc

michael-redpanda · 2025-01-21T13:54:48Z

src/v/crash_tracker/prepared_writer.cc

+bool prepared_writer::try_write_crash() {
+    serde::write(_serde_output, std::move(_prepared_cd));
+
+    auto res = ::ftruncate(_fd, 0);


FYI ftruncate doesn't modify file offset. I don't think it's likely (or possible) for the offset to not be at 0, but just something to keep in mind.

I've removed the ftruncate call now and instead perform the truncate on open.

michael-redpanda · 2025-01-21T13:56:21Z

src/v/crash_tracker/prepared_writer.cc

+    }
+
+    for (const auto& frag : _serde_output) {
+        auto written = ::write(_fd, frag.get(), frag.size());


Discussion: Unlikely, but not impossible, for write to not write all of frag.size(). Probably worthwhile to stick in a while loop.

src/v/crash_tracker/prepared_writer.cc

pgellert · 2025-01-21T16:11:13Z

force-push: address code review feedback

Defines it under `${datadir}/crash_reports` to store crash reports. It needs to be special-cased in the `compute_storage.py` script to ensure it isn't mistaken for a log file directory.

Define a custom exception for `crash_loop_limit_reached` to allow pattern matching for it in a follow up commit. We specifically do not want to record `crash_loop_limit_reached` errors as crashes because they are generally not informative crashes. Recording them as real crashes would build up garbage on disk, which would lead to real crash logs expiring on disk earlier.

Introduces the core stateful writer which is used to write out `crash_description` serde objects to crash report files. It is able to write out the crash reports in an async-signal-safe way by: * using only async signal safe syscalls * pre-allocating all the memory necessary to construct, serialize and write our `crash_description` objects to disk * using only non-allocating methods The `prepared_writer` is hooked up to the `recorder` in a later commit of this PR. Ref: https://man7.org/linux/man-pages/man7/signal-safety.7.html

Implement the logic to choose a unique crash log file name, and hook up the code for preparing the crash recording code and the crash file cleanup code on clean shutdown.

Implement the recording of startup exceptions into crash files.

Add some simple ducktape tests to assert that crash reports are being generater (or not generated) as expected. The contents of the crash files are going to be validated as a follow up once the contents are actually printed to logs.

pgellert · 2025-01-21T18:12:45Z

force-push: rebased to dev now that the base PR (#24828) merged

pgellert requested a review from a team January 17, 2025 15:47

pgellert self-assigned this Jan 17, 2025

pgellert requested review from michael-redpanda and a team and removed request for a team and michael-redpanda January 17, 2025 15:47

pgellert requested a review from a team as a code owner January 17, 2025 15:47

pgellert requested review from BenPope, michael-redpanda and IoannisRP and removed request for a team January 17, 2025 15:47

github-actions bot added area/build area/redpanda labels Jan 17, 2025

pgellert marked this pull request as draft January 17, 2025 15:48

pgellert marked this pull request as ready for review January 17, 2025 15:48

pgellert force-pushed the crashlog/record-startup-exc branch from 1dc3a86 to 09ccff8 Compare January 20, 2025 13:15

dotnwat reviewed Jan 20, 2025

View reviewed changes

michael-redpanda reviewed Jan 21, 2025

View reviewed changes

pgellert force-pushed the crashlog/record-startup-exc branch from 09ccff8 to c7e2ba2 Compare January 21, 2025 16:10

pgellert requested review from michael-redpanda and dotnwat January 21, 2025 16:11

pgellert added 6 commits January 21, 2025 18:10

config: define crash_reports directory

1fae3d0

Defines it under `${datadir}/crash_reports` to store crash reports. It needs to be special-cased in the `compute_storage.py` script to ensure it isn't mistaken for a log file directory.

crash_tracker: prepare and release crash log file

8a5ef27

Implement the logic to choose a unique crash log file name, and hook up the code for preparing the crash recording code and the crash file cleanup code on clean shutdown.

crash_tracker: record startup exception crashes

452e456

Implement the recording of startup exceptions into crash files.

dt: test crash report file creation and cleanup

cfae5ae

Add some simple ducktape tests to assert that crash reports are being generater (or not generated) as expected. The contents of the crash files are going to be validated as a follow up once the contents are actually printed to logs.

pgellert force-pushed the crashlog/record-startup-exc branch from c7e2ba2 to cfae5ae Compare January 21, 2025 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CORE-8618 crash_tracker: record uncaught startup exceptions #24854

CORE-8618 crash_tracker: record uncaught startup exceptions #24854

pgellert commented Jan 17, 2025 •

edited

Loading

vbotbuildovich commented Jan 17, 2025 •

edited

Loading

pgellert commented Jan 20, 2025

dotnwat left a comment

dotnwat Jan 20, 2025

pgellert Jan 21, 2025

dotnwat Jan 20, 2025

dotnwat Jan 20, 2025

dotnwat Jan 20, 2025

pgellert Jan 21, 2025

dotnwat Jan 20, 2025

pgellert Jan 21, 2025 •

edited

Loading

dotnwat Jan 20, 2025

pgellert Jan 21, 2025

dotnwat Jan 20, 2025

pgellert Jan 21, 2025

dotnwat Jan 20, 2025

pgellert Jan 21, 2025

dotnwat Jan 20, 2025

pgellert Jan 21, 2025

michael-redpanda Jan 21, 2025

pgellert Jan 21, 2025

michael-redpanda Jan 21, 2025

pgellert Jan 21, 2025

michael-redpanda Jan 21, 2025

pgellert Jan 21, 2025

pgellert commented Jan 21, 2025

pgellert commented Jan 21, 2025

		co_await ss::sync_directory(
		_crash_report_file_name.parent_path().string());

		std::lock_guard<ss::util::spinlock> g(_writer_lock);
		co_await _writer.release();

CORE-8618 crash_tracker: record uncaught startup exceptions #24854

Are you sure you want to change the base?

CORE-8618 crash_tracker: record uncaught startup exceptions #24854

Conversation

pgellert commented Jan 17, 2025 • edited Loading

Backports Required

Release Notes

vbotbuildovich commented Jan 17, 2025 • edited Loading

CI test results

pgellert commented Jan 20, 2025

dotnwat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgellert Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgellert commented Jan 21, 2025

pgellert commented Jan 21, 2025

pgellert commented Jan 17, 2025 •

edited

Loading

vbotbuildovich commented Jan 17, 2025 •

edited

Loading

pgellert Jan 21, 2025 •

edited

Loading