Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTL-2454 kdump 1.9 fixes #66

Merged
merged 1 commit into from
Oct 8, 2024
Merged

MTL-2454 kdump 1.9 fixes #66

merged 1 commit into from
Oct 8, 2024

Conversation

rustydb
Copy link
Contributor

@rustydb rustydb commented Oct 7, 2024

Summary and Scope

Issue Type

  • RFE Pull Request

My testing used the following KDUMP_SAVEDIR:

ncn-w003:~ # grep KDUMP_SAVEDIR /etc/sysconfig/kdump
KDUMP_SAVEDIR="file:///run/initramfs/overlayfs/var/crash"

A booted system with the correct links:

ncn-w003:~ # ls -l /run/initramfs/overlayfs/
total 4
drwxr-xr-x 4 root root  82 Oct  7 22:01 1.6.0-alpha.67
-rw-r--r-- 1 root root 297 Oct  7 22:01 README.txt
lrwxrwxrwx 1 root root  75 Oct  7 22:01 boot -> ./1.6.0-alpha.67/overlay-SQFSRAID-ca2d1e34-91ad-4ac5-ac45-323302769957/boot
drwxr-xr-x 2 root root  19 Oct  7 22:01 var
ncn-w003:~ # ls -l /run/initramfs/overlayfs/var/crash
lrwxrwxrwx 1 root root 81 Oct  7 22:01 /run/initramfs/overlayfs/var/crash -> ../1.6.0-alpha.67/overlay-SQFSRAID-ca2d1e34-91ad-4ac5-ac45-323302769957/var/crash

Triggered crash (observes MTL-2494 but continues due to KDUMP_CONTINUE_ON_ERROR="true")

ncn-w003:~ # echo c >/proc/sysrq-trigger
[  492.519254][T19050] sysrq: Trigger a crash
[  492.523388][T19050] Kernel panic - not syncing: sysrq triggered crash
[  492.529839][T19050] CPU: 36 PID: 19050 Comm: bash Kdump: loaded Tainted: G           OE        6.4.0-150600.23.17-default #1 SLE15-SP6 5eff937aa9559314e63d74eec60fba157a5dbfd6
[  492.545480][T19050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0016.C0001.032120230338 03/21/2023
[  492.557125][T19050] Call Trace:
[  492.560277][T19050]  <TASK>
[  492.563079][T19050]  dump_stack_lvl+0x57/0x80
[  492.567452][T19050]  panic+0x109/0x2c0
[  492.571211][T19050]  ? _printk+0x52/0x80
[  492.575138][T19050]  sysrq_handle_crash+0x16/0x20
[  492.579852][T19050]  __handle_sysrq+0x9c/0x170
[  492.584299][T19050]  write_sysrq_trigger+0x2b/0x40
[  492.589089][T19050]  proc_reg_write+0x53/0x80
[  492.593459][T19050]  vfs_write+0xc2/0x380
[  492.597479][T19050]  ? __handle_mm_fault+0xa98/0xba0
[  492.602456][T19050]  ksys_write+0xa5/0xe0
[  492.606474][T19050]  do_syscall_64+0x58/0x80
[  492.610765][T19050]  ? handle_mm_fault+0x196/0x2f0
[  492.615557][T19050]  ? do_user_addr_fault+0x267/0x890
[  492.620619][T19050]  ? exc_page_fault+0x69/0x150
[  492.625244][T19050]  entry_SYSCALL_64_after_hwframe+0x7c/0xe6
[  492.631001][T19050] RIP: 0033:0x7feb08d0c770
[  492.635314][T19050] Code: 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 80 3d b9 c2 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
[  492.654755][T19050] RSP: 002b:00007ffe9c8f38e8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  492.663020][T19050] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007feb08d0c770
[  492.670846][T19050] RDX: 0000000000000002 RSI: 000056033f4addd0 RDI: 0000000000000001
[  492.678669][T19050] RBP: 000056033f4addd0 R08: 0000000000000000 R09: 0000000000000000
[  492.686494][T19050] R10: 00007feb08bfbf08 R11: 0000000000000202 R12: 0000000000000002
[  492.694320][T19050] R13: 00007feb08deb5c0 R14: 00007feb08de8f60 R15: 000056033f667420
[  492.702149][T19050]  </TASK>
[    0.137853][    T1] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is b0)
[    5.468765] dracut-cmdline[198]: Warning: Network interface 'mgmt0' does not exist
[    5.484071] dracut-cmdline[198]: Warning: Network interface 'mgmt1' does not exist
[    6.876643][  T502] Module qed is blacklisted
[    6.897257][  T498] Module xhci_hcd is blacklisted
[    7.211103][  T496] Module qed is blacklisted
[    7.212386][  T501] Out of memory: Killed process 501 ((udev-worker)) total-vm:33180kB, anon-rss:5276kB, file-rss:4480kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0
Saved temporary README.txt
Saved dmesg
Copying data                                      : [100.0 %] -           eta: 0s

The dumpfile is saved to STDOUT.

makedumpfile Completed.
Saved vmcore
Saved final README.txt
umount: /kdump/mnt/run/initramfs/overlayfs: not mounted.
umount: /sys/fs/cgroup: target is busy.
umount: /run: target is busy.
umount: /dev: target is busy.
umount: /: not mounted.
Rebooting.
[   64.255613][  T761] reboot: Restarting system

Dump was available at the expected location.

ncn-w003:/var/crash # ll
total 0
drwxr-xr-x 2 root root 51 Oct  7 22:09 2024-10-07-22-09
ncn-w003:/var/crash # cd 2024-10-07-22-09/
ncn-w003:/var/crash/2024-10-07-22-09 # ll
total 972220
-rw-r--r-- 1 root root       313 Oct  7 22:10 README.txt
-rw-r--r-- 1 root root    133722 Oct  7 22:09 dmesg
-rw-r--r-- 1 root root 995413584 Oct  7 22:10 vmcore

Prerequisites

  • I have included documentation in my PR (or it is not required)
  • I tested this on internal system (if yes, please include results or a description of the test)
  • I tested this on a vshasta system (if yes, please include results or a description of the test)

Idempotency

Risks and Mitigations

Backwards compatible with kdump<1.9.

@rustydb rustydb requested a review from a team as a code owner October 7, 2024 22:23
Add support for kdump>=1.9 for both LIVE+OverlayFS and "conventional" boots. This change is backwards compatible with kdump<1.9, where other files such as `System.map` are still expected.
@rustydb rustydb merged commit ed6b208 into main Oct 8, 2024
5 checks passed
@rustydb rustydb deleted the MTL-2454-kdump branch October 8, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants