Catched SIGSEGV от scm_send_msg (/root/src/event-driven-servers/mavis/spawnd_accepted.c:422), pid: 378104: Connection refused #142

zel-ah · 2025-02-05T07:17:56Z

Hello,
sometimes I get errors like :
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: scm_send_msg: sendmsg: Connection refused
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: scm_send_msg (/root/src/event-driven-servers/mavis/spawnd_accepted.c:422), pid: 378104: Connection refused
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: Catched SIGSEGV
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: Apologies ... this shouldn't have happened. Please verify that
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: you are running the most current version from
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: https://github.com/MarcJHuber/event-driven-servers/
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: The version you're currently running is 144c6de
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: If this issue persists even with the most recent version:
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: Reconfigure with --debug, recompile and reinstall. Then send
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: a bug report via GitHub issues at
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: https://github.com/MarcJHuber/event-driven-servers/issues
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: and include the backtraces.
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: Please do NOT mail bug reports to the private mail address of
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: the author, unless you have a prior permit for that.
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: Thank you.
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: backtrace start
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 10 [0x62198f040335]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 9 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x72234de2a28b]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 8 /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x72234de2a1ca]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 7 [0x62198f057e64]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 6 /usr/local/lib/libmavis.so(scm_main+0x103) [0x72234e883bd3]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 5 /usr/local/lib/libmavis.so(spawnd_main+0x976) [0x72234e883806]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 4 /usr/local/lib/libmavis.so(io_main+0x2c) [0x72234e875fdc]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 3 /usr/local/lib/libmavis.so(io_poll+0xf1) [0x72234e8752e1]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 2 /usr/local/lib/libmavis.so(spawnd_accepted+0x967) [0x72234e880c97]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 1 /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x72234de45320]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: 0 /usr/local/lib/libmavis.so(catchsegv+0x105) [0x72234e884825]
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: EXECINFO: backtrace end
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: GDB: running: "(printf 'bt#012q#012';sleep 3)|gdb -n -q -p $CRASHPID 2>/dev/null"
Feb 5 06:15:53 ams-aaa1 tac_plus-ng[378100]: GDB: backtrace start
Feb 5 06:15:56 ams-aaa1 tac_plus-ng[378100]: GDB: backtrace end
Feb 5 06:15:56 ams-aaa1 systemd[1]: tac_plus-ng.service: Main process exited, code=exited, status=69/UNAVAILABLE
Feb 5 06:15:56 ams-aaa1 systemd[1]: tac_plus-ng.service: Failed with result 'exit-code'.
Feb 5 06:15:56 ams-aaa1 systemd[1]: tac_plus-ng.service: Unit process 345581 (tac_plus-ng) remains running after unit stopped.
Feb 5 06:15:56 ams-aaa1 systemd[1]: tac_plus-ng.service: Unit process 378102 (tac_plus-ng) remains running after unit stopped.
Feb 5 06:15:56 ams-aaa1 systemd[1]: tac_plus-ng.service: Consumed 6.869s CPU time, 236.2M memory peak, 0B memory swap peak.
Feb 5 06:15:56 ams-aaa1 systemd[1]: tac_plus-ng.service: Scheduled restart job, restart counter is at 7.

The text was updated successfully, but these errors were encountered:

MarcJHuber · 2025-02-05T17:18:16Z

Hi,

thanks for reporting. This apparently happens somewhere in the error path after spawnd_accepted.c:422. I really can't read much more from the syslog output, compiling with --debug or --ggdb and capturing the SEGV with a debugger would really be helpful.

For starters, could you please share some details about you setup? E.g. is tracking configured, or might that be the result of a high-usage situation?

Thanks,

Marc

MarcJHuber · 2025-02-08T12:22:16Z

Hi,

I've just commited 443023f, no idea whether that fixes the issue, this is just based on guessing.

Thanks,

Marc

zel-ah · 2025-02-10T08:45:05Z

Hi, thanks for the update, I'll keep an eye on it, but a possible reason could be a heavy load, I have more than 1500 devices on my network, and possibly due to memory leaks, since after two days of work the consumption increased from 4 MB to 300+ MB (Memory: 332.5M (peak: 340.2M))

MarcJHuber · 2025-02-10T16:55:02Z

Hi,

that memory leak is clearly unexpected. Could you please share some details about your configuration (mavis modules used, e.g.)?

Thanks,

Marc

zel-ah · 2025-02-11T12:07:46Z

Hi,
I use only tac_plus-ng without external modules with config:
`#!/usr/local/sbin/tac_plus-ng
id = spawnd {
listen { port = 49 }
spawn = {
instances min = 1
instances max = 30
}
background = yes
}

id = tac_plus-ng {
log authzlog { destination = /var/log/tac_plus/%Y/authorization.%Y-%m.log }
log authclog { destination = /var/log/tac_plus/%Y/authentication.%Y-%m.log }
log acctlog { destination = /var/log/tac_plus/%Y/accounting.%Y-%m.log }
accounting log = acctlog
authentication log = authclog
authorization log = authzlog
host IPv4only {
address = 0.0.0.0/0
key = "XXXXX"
}

profile noc_admin {
    script {
        if (service == shell) {
            if (cmd == "") set priv-lvl = 15
            permit
        }
    }
}
profile ro_user  {
    script {
        if (service == shell) {
            if (cmd == "") {
                set priv-lvl = 15
                permit
            }
            if (cmd =~ /^show/) {
                set priv-lvl = 15
                permit
            }
            if (cmd =~ /^export/) {
                set priv-lvl = 15
                permit
            }
            if (cmd =~ /^exit/) {
                set priv-lvl = 15
                permit
            }
            if (cmd =~ /^terminal/) {
                set priv-lvl = 15
                permit
            }
        }
    }
}`

MarcJHuber · 2025-02-11T16:11:59Z

Hi,

thanks, that should make it easier to find what's causing this. Might take some time, nevertheless,

Thanks,

Marc

MarcJHuber · 2025-02-11T19:09:19Z

Hi,

do logs actually get written to /var/log/tac_plus/...? That location being non-writable would be a fairly good excuse for memory filling up.

Just guessing.

Thanks,

Marc

zel-ah · 2025-02-12T07:52:44Z

Hello,
This location is writable and logs are written. The strange thing is that the logs often contain messages like: "Possibly no single-connection support. Context feature may or may not work." but on the equipment side, single-connection is enabled in the settings.

MarcJHuber · 2025-02-12T16:02:07Z

Hi,

thanks, so I'll look further into finding that memory leak :-(

The "single-connection" warning should go away after defining "single-connection = yes" at host or realm level.

Thanks,

Marc

zel-ah · 2025-02-21T07:31:43Z

Hi,
"single-connection = yes" - it didn't help

But I noticed that after restart the memory is not freed:

`root@aaa1:~/src/event-driven-servers# service tac_plus-ng status
● tac_plus-ng.service - TACACS+ NG Service
Loaded: loaded (/etc/systemd/system/tac_plus-ng.service; enabled; preset: enabled)
Active: active (running) since Fri 2025-02-21 06:45:14 UTC; 29s ago
Main PID: 3176288 (tac_plus-ng)
Tasks: 3 (limit: 9443)
Memory: 421.5M (peak: 422.2M)
CPU: 170ms
CGroup: /system.slice/tac_plus-ng.service
├─1474064 "tac_plus-ng: 4 connections left, dying when idle"
├─3176288 "tac_plus-ng: 3 connections, ac" /etc/tacacs-ng/tac_plus-ng.conf -d 16
└─3176289 "tac_plus-ng: 3 connections"

root@aaa1:~/src/event-driven-servers# service tac_plus-ng status
● tac_plus-ng.service - TACACS+ NG Service
Loaded: loaded (/etc/systemd/system/tac_plus-ng.service; enabled; preset: enabled)
Active: active (running) since Fri 2025-02-21 06:45:14 UTC; 1min 31s ago
Main PID: 3176288 (tac_plus-ng)
Tasks: 2 (limit: 9443)
Memory: 419.0M (peak: 422.2M)
CPU: 470ms
CGroup: /system.slice/tac_plus-ng.service
├─3176288 "tac_plus-ng: 8 connections, ac" /etc/tacacs-ng/tac_plus-ng.conf -d 16
└─3176289 "tac_plus-ng: 8 connections"
`

MarcJHuber · 2025-02-21T13:28:26Z

Hi,

please forcefully "kill -9 3176288 3176289" (or whatever the process ids are now), 1474064 and 3176288/3176289 don't look related to me. After that, please restart tac_plus-ng (and make certain to be at current GIT level). Does the issue still occur after that? If yes, how long does it take until non-sane memory usage? Do you see any relation to the number of requests served?

Thanks,

Marc

zel-ah · 2025-02-21T14:44:16Z

Hi,
I'm using the latest version.
If I restart and there are active sessions, the memory is not freed, but if I stop, then kill -9 the process with the remaining sessions "tac_plus-ng: connections left, dying when idle" and start, it will start with freed memory.

zel-ah · 2025-02-21T15:05:05Z

Hi,
I also noticed strange behavior of the switches:
C9200L-24P-4G 17.12.04 CAT9K_LITE_IOSXE
they always maintain 2-3 active connections to the server, in fact, when restarting the process, it waits for the connections from them to complete,but if I configure single-connection on them, they stop working with tacacs.

MarcJHuber · 2025-02-21T16:25:49Z

Hi,

I simply can't test that, due to lack of suitable hardware. May or may not be an issue with IOS-XE, or with tac_plus-ng. In general, if restarting tac_plus-ng causes an issue with devices it looks more like a problem with the device implementation.

Cheers,

Marc

zel-ah · 2025-02-21T20:10:51Z

Hi,
Regarding the IOS-XE device, perhaps you didn’t quite understand me, when setting up single-connection on the switch, the switch stops working with the tacacs server, while the server itself works fine.
Thanks for the updates, I've stopped seeing SIGSEGV lately.
Now the main problem is a memory leak, and it also appears on other tacacs servers instaces that don't have IOS-XE devices in service, but the memory leak doesn't happen so quickly there.

MarcJHuber · 2025-02-22T08:26:25Z

Hi,

I see no issues with single connection on a C8000v platform running 17.12.04. I'd guess that that's the same TACACS+ client implementation. You could perhaps isolate the issue by enabling tacaca debugging on your switch.

I'm still looking into the memory leak issue. For now, mitigating it with "retire limit = <max. number of connections to be served>" and "connection timeout = " should work.

Cheers,

Marc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catched SIGSEGV от scm_send_msg (/root/src/event-driven-servers/mavis/spawnd_accepted.c:422), pid: 378104: Connection refused #142

Catched SIGSEGV от scm_send_msg (/root/src/event-driven-servers/mavis/spawnd_accepted.c:422), pid: 378104: Connection refused #142

zel-ah commented Feb 5, 2025

MarcJHuber commented Feb 5, 2025

MarcJHuber commented Feb 8, 2025

zel-ah commented Feb 10, 2025

MarcJHuber commented Feb 10, 2025

zel-ah commented Feb 11, 2025 •

edited

Loading

MarcJHuber commented Feb 11, 2025

MarcJHuber commented Feb 11, 2025

zel-ah commented Feb 12, 2025

MarcJHuber commented Feb 12, 2025

zel-ah commented Feb 21, 2025

MarcJHuber commented Feb 21, 2025

zel-ah commented Feb 21, 2025

zel-ah commented Feb 21, 2025

MarcJHuber commented Feb 21, 2025

zel-ah commented Feb 21, 2025

MarcJHuber commented Feb 22, 2025

Catched SIGSEGV от scm_send_msg (/root/src/event-driven-servers/mavis/spawnd_accepted.c:422), pid: 378104: Connection refused #142

Catched SIGSEGV от scm_send_msg (/root/src/event-driven-servers/mavis/spawnd_accepted.c:422), pid: 378104: Connection refused #142

Comments

zel-ah commented Feb 5, 2025

MarcJHuber commented Feb 5, 2025

MarcJHuber commented Feb 8, 2025

zel-ah commented Feb 10, 2025

MarcJHuber commented Feb 10, 2025

zel-ah commented Feb 11, 2025 • edited Loading

MarcJHuber commented Feb 11, 2025

MarcJHuber commented Feb 11, 2025

zel-ah commented Feb 12, 2025

MarcJHuber commented Feb 12, 2025

zel-ah commented Feb 21, 2025

MarcJHuber commented Feb 21, 2025

zel-ah commented Feb 21, 2025

zel-ah commented Feb 21, 2025

MarcJHuber commented Feb 21, 2025

zel-ah commented Feb 21, 2025

MarcJHuber commented Feb 22, 2025

zel-ah commented Feb 11, 2025 •

edited

Loading