You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a Teleport node with PAM support and PAM auth enabled, users should be presented with output from configured PAM directives, and should be able to provide auth interactive input (like an PAM module that prompts for an OTP code)
Current behavior:
When attempting to open an SSH session, the session terminates after around 20 seconds. The Teleport node logs "Child process never became ready," and "trace.aggregate timed out waiting for continue signal." The user is presented with a generic "EOF" message.
Bug details:
Teleport version: 16.4.7
Interestingly, Teleport 7.3.26 does not have this bug.
Note: My 16.4.7 lab is a typical node instance that is connected to a cluster.
My Teleport 7.3.26 instance was an auth+proxy+node all in one teleport.yaml file. The ssh_service pam configuration was the same in both clusters. The teleport 7 instance used an alternate data_dir and bind address so both versions could run on the same host at the same time sudo /path/to/7.3.26/bin/teleport start -d -c /etc/teleport7.yaml
All tsh commands were run with a tsh version that matches the cluster version.
Expected behavior (as seen in Teleport 7.3.26):
% tsh ssh jeff@teleport7
******** Welcome jeff
Please input YES to continue:
YES
Linux myhost 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64
...
jeff@myhost:~$
Current behavior (as seen in Teleport 16.4.7):
% tsh ssh --no-resume jeff@teleport16
ERROR: EOF
The tsh command hangs for 20 seconds before the EOF message appears.
tsh debug messages
jeff@macbook ~ % date; time tsh ssh -d --no-resume jeff@teleport16; echo; date
Thu Nov 14 16:44:47 EST 2024
2024-11-14T16:44:48-05:00 INFO [CLIENT] ALPN connection upgrade required for "teleport.example.com:443": false. client/api.go:841
2024-11-14T16:44:48-05:00 INFO [CLIENT] [KEY AGENT] Connected to the system agent: "/private/tmp/com.apple.launchd.2ZpKoTfyMF/Listeners" client/api.go:4619
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Reading certificates from path "/Users/jeff/.tsh/keys/teleport.example.com/jeff-ssh/teleport.example.com-cert.pub". client/keystore.go:357
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 INFO [KEYAGENT] Loading SSH key for user "jeff" and cluster "teleport.example.com". client/keyagent.go:198
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [CLIENT] Using provided host myhost client/api.go:1460
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Reading certificates from path "/Users/jeff/.tsh/keys/teleport.example.com/jeff-ssh/teleport.example.com-cert.pub". client/keystore.go:357
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Reading certificates from path "/Users/jeff/.tsh/keys/teleport.example.com/jeff-kube/teleport.example.com". client/keystore.go:357
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [CLIENT] Attempting to issue a single-use user certificate with an MFA check. client/cluster_client.go:302
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE] Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYAGENT] "Checking key: [email protected] AAAA...\n." client/keyagent.go:370
2024-11-14T16:44:48-05:00 DEBU [KEYAGENT] Validated host myhost:0@[email protected]. client/keyagent.go:376
2024-11-14T16:44:48-05:00 DEBU [CLIENT] No Key Agent selected. client/session.go:284
2024-11-14T16:45:08-05:00 DEBU [CLIENT] not using loopback pool for remote proxy addr: teleport.example.com:443 client/api.go:4574
2024-11-14T16:45:08-05:00 DEBU Attempting request to Proxy web api method:GET host:teleport.example.com:443 path:/webapi/ping trace_id:1fddeb9c703dd9b6067c048421200fc1 span_id:07bca1541fd0cbbe webclient/webclient.go:131
2024-11-14T16:45:08-05:00 DEBU ALPN connection upgrade test complete address:teleport.example.com:443 upgrade_required:false trace_id:1fddeb9c703dd9b6067c048421200fc1 span_id:07bca1541fd0cbbe client/alpn_conn_upgrade.go:96
ERROR REPORT:
Original Error: *errors.errorString EOF
Stack Trace:
github.com/gravitational/teleport/[email protected]/observability/tracing/ssh/session.go:269 github.com/gravitational/teleport/api/observability/tracing/ssh.(*Session).Shell
github.com/gravitational/teleport/lib/client/session.go:533 github.com/gravitational/teleport/lib/client.(*NodeClient).RunInteractiveShell.(*NodeSession).runShell.func2
github.com/gravitational/teleport/lib/client/session.go:318 github.com/gravitational/teleport/lib/client.(*NodeSession).interactiveSession
github.com/gravitational/teleport/lib/client/session.go:522 github.com/gravitational/teleport/lib/client.(*NodeSession).runShell
github.com/gravitational/teleport/lib/client/client.go:403 github.com/gravitational/teleport/lib/client.(*NodeClient).RunInteractiveShell
github.com/gravitational/teleport/lib/client/api.go:2019 github.com/gravitational/teleport/lib/client.(*TeleportClient).runShellOrCommandOnSingleNode
github.com/gravitational/teleport/lib/client/api.go:1750 github.com/gravitational/teleport/lib/client.(*TeleportClient).SSH
github.com/gravitational/teleport/tool/tsh/common/tsh.go:3615 github.com/gravitational/teleport/tool/tsh/common.onSSH.func1.1
github.com/gravitational/teleport/lib/client/api.go:607 github.com/gravitational/teleport/lib/client.RetryWithRelogin
github.com/gravitational/teleport/tool/tsh/common/tsh.go:3620 github.com/gravitational/teleport/tool/tsh/common.onSSH.func1
github.com/gravitational/teleport/tool/tsh/common/tsh.go:3421 github.com/gravitational/teleport/tool/tsh/common.retryWithAccessRequest
github.com/gravitational/teleport/tool/tsh/common/tsh.go:3608 github.com/gravitational/teleport/tool/tsh/common.onSSH
github.com/gravitational/teleport/tool/tsh/common/tsh.go:1363 github.com/gravitational/teleport/tool/tsh/common.Run
github.com/gravitational/teleport/tool/tsh/common/tsh.go:614 github.com/gravitational/teleport/tool/tsh/common.Main
github.com/gravitational/teleport/tool/tsh/main.go:26 main.main
runtime/proc.go:271 runtime.main
runtime/asm_amd64.s:1695 runtime.goexit
User Message: EOF
tsh ssh -d --no-resume jeff@teleport16 0.14s user 0.23s system 1% cpu 21.263 total
Thu Nov 14 16:45:08 EST 2024
jeff@macbook ~ %
Here are the debug logs on the 16.4.7 ssh node:
Nov 14 14:44:48 myhost teleport[82227]: 2024-11-14T14:44:48-07:00 INFO [SESSION:N] New party ServerContext(203.0.113.183:64776->192.168.168.191:57822, user=jeff, id=3) party(id=ca8859c3-b1c2-5b73-5d96-d9ba4aeb8f8a) joined the session with participant mode: peer. session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:1986
Nov 14 14:44:48 myhost teleport[82227]: 2024-11-14T14:44:48-07:00 DEBU [SESSION:N] Launching session session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:1240
Nov 14 14:44:57 myhost teleport[82227]: 2024-11-14T14:44:57-07:00 DEBU [PROXY:AGE] Received discovery request: [1a80270a-9b74-633c-5c8d-42dda5cac488, 7ecef897-2136-6f33-bab0-1f1a32f60d6b, c28edef9-64dc-8c79-e2a5-1ba6dfa57747] leaseID:1 target:teleport.example.com:443 localCluster: targetCluster:teleport.example.com reversetunnel/agent.go:652
Nov 14 14:44:58 myhost teleport[82227]: 2024-11-14T14:44:58-07:00 DEBU [PROXY:AGE] Received discovery request: [1a80270a-9b74-633c-5c8d-42dda5cac488, c28edef9-64dc-8c79-e2a5-1ba6dfa57747, 7ecef897-2136-6f33-bab0-1f1a32f60d6b] leaseID:2 target:teleport.example.com:443 localCluster: targetCluster:teleport.example.com reversetunnel/agent.go:652
Nov 14 14:45:08 myhost teleport[82227]: 2024-11-14T14:45:08-07:00 ERRO [SESSION:N] Child process never became ready error:[
Nov 14 14:45:08 myhost teleport[82227]: ERROR REPORT:
Nov 14 14:45:08 myhost teleport[82227]: Original Error: trace.aggregate timed out waiting for continue signal
Nov 14 14:45:08 myhost teleport[82227]: Stack Trace:
Nov 14 14:45:08 myhost teleport[82227]: github.com/gravitational/teleport/lib/srv/term.go:268 github.com/gravitational/teleport/lib/srv.(*terminal).WaitForChild
Nov 14 14:45:08 myhost teleport[82227]: github.com/gravitational/teleport/lib/srv/sess.go:1347 github.com/gravitational/teleport/lib/srv.(*session).startInteractive
Nov 14 14:45:08 myhost teleport[82227]: github.com/gravitational/teleport/lib/srv/sess.go:389 github.com/gravitational/teleport/lib/srv.(*SessionRegistry).OpenSession
Nov 14 14:45:08 myhost teleport[82227]: github.com/gravitational/teleport/lib/srv/termhandlers.go:126 github.com/gravitational/teleport/lib/srv.(*TermHandlers).HandleShell
Nov 14 14:45:08 myhost teleport[82227]: github.com/gravitational/teleport/lib/srv/regular/sshserver.go:1921 github.com/gravitational/teleport/lib/srv/regular.(*Server).dispatch
Nov 14 14:45:08 myhost teleport[82227]: github.com/gravitational/teleport/lib/srv/regular/sshserver.go:1804 github.com/gravitational/teleport/lib/srv/regular.(*Server).handleSessionRequests
Nov 14 14:45:08 myhost teleport[82227]: github.com/gravitational/teleport/lib/srv/regular/sshserver.go:1571 github.com/gravitational/teleport/lib/srv/regular.(*Server).HandleNewChan.func1
Nov 14 14:45:08 myhost teleport[82227]: runtime/asm_amd64.s:1695 runtime.goexit
Nov 14 14:45:08 myhost teleport[82227]: User Message: timed out waiting for continue signal] session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:1348
Nov 14 14:45:08 myhost teleport[82227]: 2024-11-14T14:45:08-07:00 INFO [SESSION:N] Closing session session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:957
Nov 14 14:45:08 myhost teleport[82227]: 2024-11-14T14:45:08-07:00 INFO [SESSION:N] Closing party ca8859c3-b1c2-5b73-5d96-d9ba4aeb8f8a srv/sess.go:2155
Nov 14 14:45:08 myhost teleport[82227]: 2024-11-14T14:45:08-07:00 INFO [SESSION:N] Removing party ServerContext(203.0.113.183:64776->192.168.168.191:57822, user=jeff, id=3) party(id=ca8859c3-b1c2-5b73-5d96-d9ba4aeb8f8a) from session session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:1579
Strace analysis
When reviewing the behavior of both versions of Teleport via strace, there were a few subtle differences in which calls were made. The pid that actually launches the otp_banner.sh is cloned in both versions.
The new version has that mmap call that seems to be absent in the old version. The clone options are also slightly different.
I think there may be a change in how go and/or cgo handles go threads and loading libraries, which leads to the callback never being processed for this type of PAM message.
The text was updated successfully, but these errors were encountered:
Expected behavior:
On a Teleport node with PAM support and PAM auth enabled, users should be presented with output from configured PAM directives, and should be able to provide auth interactive input (like an PAM module that prompts for an OTP code)
Current behavior:
When attempting to open an SSH session, the session terminates after around 20 seconds. The Teleport node logs "Child process never became ready," and "trace.aggregate timed out waiting for continue signal." The user is presented with a generic "EOF" message.
Bug details:
Recreation steps
To reproduce this error, and demonstrate it was working with 7.x, I did the following.
/etc/pam.d/teleport
/etc/pam.d/otp_banner.sh
/etc/pam.d/otp_check.sh
/etc/teleport.yaml
Note: My 16.4.7 lab is a typical
node
instance that is connected to a cluster.My Teleport 7.3.26 instance was an auth+proxy+node all in one
teleport.yaml
file. Thessh_service
pam configuration was the same in both clusters. The teleport 7 instance used an alternate data_dir and bind address so both versions could run on the same host at the same timesudo /path/to/7.3.26/bin/teleport start -d -c /etc/teleport7.yaml
All
tsh
commands were run with a tsh version that matches the cluster version.Expected behavior (as seen in Teleport 7.3.26):
Current behavior (as seen in Teleport 16.4.7):
The
tsh
command hangs for 20 seconds before the EOF message appears.tsh debug messages
Here are the debug logs on the 16.4.7 ssh node:
Strace analysis
When reviewing the behavior of both versions of Teleport via
strace
, there were a few subtle differences in which calls were made. The pid that actually launches theotp_banner.sh
is cloned in both versions.click to expand
This is an strace snippet from the non-working version. It is nearly identical to the working version:
The new version has that
mmap
call that seems to be absent in the old version. The clone options are also slightly different.I think there may be a change in how go and/or cgo handles go threads and loading libraries, which leads to the callback never being processed for this type of PAM message.
The text was updated successfully, but these errors were encountered: