-
Notifications
You must be signed in to change notification settings - Fork 0
NMI Handler for user-defined codes
License
Saruspete/nmimgr
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
================== NMI Manager ================== This module allows you to Panic or Ignore specific NMI events. Author: Adrien Mahieux <[email protected]> Source: https://github.com/saruspete/nmimgr Tools: https://github.com/saruspete/kdumptools https://fr.slideshare.net/Saruspete/kernel-crashdump-53496836 ------------ What is this ------------ Manage NMI events in a more fine-grained manner than "unknown_nmi_panic" sysctl. When a production host is unresponsive, we'd like to take a Kernel Dump for offline issue analysis. If kdump is correctly setup, we need to crash/panic the system for it to start This is usually done by sending an NMI to the system (as no userland process is responding anymore) through the BMC (which name and implementation is vendor specific). But if no handler registers the vendor-specific NMI event to trigger a crash, the kernel logs a "Dazed and confused, but trying to continue" message and server is still unresponsive. ---------------------------------------- Why not just using "nmi_panic" sysctls ? ---------------------------------------- There is 3 sysctls that allows administrators to generate a panic: - panic_on_io_nmi - panic_on_unrecovered_nmi - unknown_nmi_panic These sysctl are overkill as multiple NMI can be generated for non-critical events like: - Software debugging, like perf on Pentium processors - External cards like FPGA to communicate - Motherboard alerts of a dying Power-Supply If you are fine with the current unknown_nmi_panic settings, this module can also be used to ignore other NMIs during the dump process, even those who have a kernel module for handling. This avoid the interruption of the dump process, thus having a non-usable coredump. ------------- How to use it ------------- Build it -------- Built it for the current kernel: # make Or specify custom/multiple versions if you have a build env # make 2.6.32-642.15.1.el6.x86_64 3.10.0-327.36.1.el7.x86_64 4.8.13-100.fc23.x86_64 Load the module (temporarily) ----------------------------- Usage as a module (temp, insmod): insmod nmimgr.ko events_panic=0,1,2,5-12,13,255 events_ignore=99 Check the logs -------------- As NMI should be a serious indicator, the module will generate some logs at startup and when handling a new NMI. When trying new hardware, you may just load the module, generate an NMI from the BMC and check dmesg for lines containing "nmimgr:", specifically the log "Handling new NMI". The code you are interested in (the event) is the decimal value between ( ) If you see this log: "Handling new NMI type:1 event:0x10 (16)" Then the event code generated is 16. To make the system panic: # insmod nmimgr.ko events_panic=16 To ignore it and disable messages: # insmod nmimgr.ko events_ignore=16 Add it permanently ------------------ Once checked it works with your kernel, you can make it more permanent: 1) copy the module (file nmimgr.ko) in: # cp nmimgr.kmod.$(uname -r)/nmimgr.ko /lib/modules/$(uname -r)/extra/ 2) Regen the module database: # depmod -a 3) Set the parameters for modprobe: # echo "options nmimgr events_panic=16" > /etc/modprobe.d/nmimgr.conf 4) Load it with modprobe # modprobe nmimgr 5) Check the parameters are correctly set (you should see your value) # cat /sys/module/nmimgr/parameters/events_panic Parameters ---------- - events_panic=LIST Events to make the kernel Panic - events_ignore=LIST Events to drop, so no other handler can process them LIST is standard kernel lists, can be composed of - simple lists: 0,13,16,44,10 - ranges: 10-100 - Mix of both: 0,1,2-8,10 Should you embed it with your kernel, you can configure it with boot cmd: nmimgr.events_panic=0,1,2,5-12,13,255 nmimgr.events_ignore=99 -------------- How to test it -------------- Generate an NMI --------------- - ipmitool chassis power diag - vboxmanage debugvm "VMName" injectnmi - virsh inject-nmi "VMName" Usual generated NMI events (in decimal, to be used as module parameters): - HP Ilo : 32,48 - Dell IDRAC: 32,33,48,49 - IBM : 44,60 - VirtualBox: 0,16,32,48 ----------------------- Kernel Revision history ----------------------- 2.6.32: Using notifier_block structs 3.2 : Moved NMI descriptions to an enum: LOCAL, UNKNOWN, MAX https://lwn.net/Articles/461215/ https://lkml.org/lkml/2012/3/8/386 3.5 : Moved "register_nmi_handler" to a macro + static struct nmiaction fn##_na This broke the loop logic used between 3.2 and 3.5
About
NMI Handler for user-defined codes
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published