Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some PAM messages are not handled by Teleport #49028

Open
programmerq opened this issue Nov 14, 2024 · 0 comments
Open

Some PAM messages are not handled by Teleport #49028

programmerq opened this issue Nov 14, 2024 · 0 comments
Labels
bug c-co Internal Customer Reference PAM Label related to Pluggable Authentication Module (PAM) Submethod. ssh

Comments

@programmerq
Copy link
Contributor

Expected behavior:

On a Teleport node with PAM support and PAM auth enabled, users should be presented with output from configured PAM directives, and should be able to provide auth interactive input (like an PAM module that prompts for an OTP code)

Current behavior:

When attempting to open an SSH session, the session terminates after around 20 seconds. The Teleport node logs "Child process never became ready," and "trace.aggregate timed out waiting for continue signal." The user is presented with a generic "EOF" message.

Bug details:

Recreation steps

To reproduce this error, and demonstrate it was working with 7.x, I did the following.

/etc/pam.d/teleport

#%PAM-1.0
account    required   pam_exec.so stdout /etc/pam.d/otp_banner.sh
auth       required   pam_exec.so expose_authtok /etc/pam.d/otp_check.sh
session    required   pam_motd.so
session    required   pam_permit.so

/etc/pam.d/otp_banner.sh

#!/bin/bash

# Print welcome banner
username=${TELEPORT_USERNAME%@*}
echo ' ********    Welcome '$username'  '

# example/dummy OTP of "YES" for testing
echo -n ' Please input YES to continue: '
exit 0

/etc/pam.d/otp_check.sh

#!/bin/sh

read token

if [ "$token" = "YES" ]; then
 exit 0
fi

exit 1

/etc/teleport.yaml

ssh_service:
  enabled: yes
  pam:
    enabled: yes
    service_name: "teleport"
    use_pam_auth: true

Note: My 16.4.7 lab is a typical node instance that is connected to a cluster.

My Teleport 7.3.26 instance was an auth+proxy+node all in one teleport.yaml file. The ssh_service pam configuration was the same in both clusters. The teleport 7 instance used an alternate data_dir and bind address so both versions could run on the same host at the same time sudo /path/to/7.3.26/bin/teleport start -d -c /etc/teleport7.yaml

All tsh commands were run with a tsh version that matches the cluster version.

Expected behavior (as seen in Teleport 7.3.26):

% tsh ssh jeff@teleport7
 ********    Welcome jeff
 Please input YES to continue:
YES
Linux myhost 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64
...
jeff@myhost:~$ 

Current behavior (as seen in Teleport 16.4.7):

% tsh ssh --no-resume jeff@teleport16
ERROR: EOF

The tsh command hangs for 20 seconds before the EOF message appears.

tsh debug messages
jeff@macbook ~ % date; time tsh ssh -d --no-resume jeff@teleport16; echo; date
Thu Nov 14 16:44:47 EST 2024
2024-11-14T16:44:48-05:00 INFO [CLIENT]    ALPN connection upgrade required for "teleport.example.com:443": false. client/api.go:841
2024-11-14T16:44:48-05:00 INFO [CLIENT]    [KEY AGENT] Connected to the system agent: "/private/tmp/com.apple.launchd.2ZpKoTfyMF/Listeners" client/api.go:4619
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Reading certificates from path "/Users/jeff/.tsh/keys/teleport.example.com/jeff-ssh/teleport.example.com-cert.pub". client/keystore.go:357
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 INFO [KEYAGENT]  Loading SSH key for user "jeff" and cluster "teleport.example.com". client/keyagent.go:198
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [CLIENT]    Using provided host myhost client/api.go:1460
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Reading certificates from path "/Users/jeff/.tsh/keys/teleport.example.com/jeff-ssh/teleport.example.com-cert.pub". client/keystore.go:357
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Reading certificates from path "/Users/jeff/.tsh/keys/teleport.example.com/jeff-kube/teleport.example.com". client/keystore.go:357
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [CLIENT]    Attempting to issue a single-use user certificate with an MFA check. client/cluster_client.go:302
2024-11-14T16:44:48-05:00 DEBU [KEYSTORE]  Teleport TLS certificate valid until "2024-11-15 04:55:57 +0000 UTC". client/client_store.go:118
2024-11-14T16:44:48-05:00 DEBU [KEYAGENT]  "Checking key: [email protected] AAAA...\n." client/keyagent.go:370
2024-11-14T16:44:48-05:00 DEBU [KEYAGENT]  Validated host myhost:0@[email protected]. client/keyagent.go:376
2024-11-14T16:44:48-05:00 DEBU [CLIENT]    No Key Agent selected. client/session.go:284
2024-11-14T16:45:08-05:00 DEBU [CLIENT]    not using loopback pool for remote proxy addr: teleport.example.com:443 client/api.go:4574
2024-11-14T16:45:08-05:00 DEBU  Attempting request to Proxy web api method:GET host:teleport.example.com:443 path:/webapi/ping trace_id:1fddeb9c703dd9b6067c048421200fc1 span_id:07bca1541fd0cbbe webclient/webclient.go:131
2024-11-14T16:45:08-05:00 DEBU  ALPN connection upgrade test complete address:teleport.example.com:443 upgrade_required:false trace_id:1fddeb9c703dd9b6067c048421200fc1 span_id:07bca1541fd0cbbe client/alpn_conn_upgrade.go:96

ERROR REPORT:
Original Error: *errors.errorString EOF
Stack Trace:
	github.com/gravitational/teleport/[email protected]/observability/tracing/ssh/session.go:269 github.com/gravitational/teleport/api/observability/tracing/ssh.(*Session).Shell
	github.com/gravitational/teleport/lib/client/session.go:533 github.com/gravitational/teleport/lib/client.(*NodeClient).RunInteractiveShell.(*NodeSession).runShell.func2
	github.com/gravitational/teleport/lib/client/session.go:318 github.com/gravitational/teleport/lib/client.(*NodeSession).interactiveSession
	github.com/gravitational/teleport/lib/client/session.go:522 github.com/gravitational/teleport/lib/client.(*NodeSession).runShell
	github.com/gravitational/teleport/lib/client/client.go:403 github.com/gravitational/teleport/lib/client.(*NodeClient).RunInteractiveShell
	github.com/gravitational/teleport/lib/client/api.go:2019 github.com/gravitational/teleport/lib/client.(*TeleportClient).runShellOrCommandOnSingleNode
	github.com/gravitational/teleport/lib/client/api.go:1750 github.com/gravitational/teleport/lib/client.(*TeleportClient).SSH
	github.com/gravitational/teleport/tool/tsh/common/tsh.go:3615 github.com/gravitational/teleport/tool/tsh/common.onSSH.func1.1
	github.com/gravitational/teleport/lib/client/api.go:607 github.com/gravitational/teleport/lib/client.RetryWithRelogin
	github.com/gravitational/teleport/tool/tsh/common/tsh.go:3620 github.com/gravitational/teleport/tool/tsh/common.onSSH.func1
	github.com/gravitational/teleport/tool/tsh/common/tsh.go:3421 github.com/gravitational/teleport/tool/tsh/common.retryWithAccessRequest
	github.com/gravitational/teleport/tool/tsh/common/tsh.go:3608 github.com/gravitational/teleport/tool/tsh/common.onSSH
	github.com/gravitational/teleport/tool/tsh/common/tsh.go:1363 github.com/gravitational/teleport/tool/tsh/common.Run
	github.com/gravitational/teleport/tool/tsh/common/tsh.go:614 github.com/gravitational/teleport/tool/tsh/common.Main
	github.com/gravitational/teleport/tool/tsh/main.go:26 main.main
	runtime/proc.go:271 runtime.main
	runtime/asm_amd64.s:1695 runtime.goexit
User Message: EOF
tsh ssh -d --no-resume jeff@teleport16  0.14s user 0.23s system 1% cpu 21.263 total

Thu Nov 14 16:45:08 EST 2024
jeff@macbook ~ %

Here are the debug logs on the 16.4.7 ssh node:

Nov 14 14:44:48 myhost teleport[82227]: 2024-11-14T14:44:48-07:00 INFO [SESSION:N] New party ServerContext(203.0.113.183:64776->192.168.168.191:57822, user=jeff, id=3) party(id=ca8859c3-b1c2-5b73-5d96-d9ba4aeb8f8a) joined the session with participant mode: peer. session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:1986
Nov 14 14:44:48 myhost teleport[82227]: 2024-11-14T14:44:48-07:00 DEBU [SESSION:N] Launching session session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:1240
Nov 14 14:44:57 myhost teleport[82227]: 2024-11-14T14:44:57-07:00 DEBU [PROXY:AGE] Received discovery request: [1a80270a-9b74-633c-5c8d-42dda5cac488, 7ecef897-2136-6f33-bab0-1f1a32f60d6b, c28edef9-64dc-8c79-e2a5-1ba6dfa57747] leaseID:1 target:teleport.example.com:443 localCluster: targetCluster:teleport.example.com reversetunnel/agent.go:652
Nov 14 14:44:58 myhost teleport[82227]: 2024-11-14T14:44:58-07:00 DEBU [PROXY:AGE] Received discovery request: [1a80270a-9b74-633c-5c8d-42dda5cac488, c28edef9-64dc-8c79-e2a5-1ba6dfa57747, 7ecef897-2136-6f33-bab0-1f1a32f60d6b] leaseID:2 target:teleport.example.com:443 localCluster: targetCluster:teleport.example.com reversetunnel/agent.go:652
Nov 14 14:45:08 myhost teleport[82227]: 2024-11-14T14:45:08-07:00 ERRO [SESSION:N] Child process never became ready error:[
Nov 14 14:45:08 myhost teleport[82227]: ERROR REPORT:
Nov 14 14:45:08 myhost teleport[82227]: Original Error: trace.aggregate timed out waiting for continue signal
Nov 14 14:45:08 myhost teleport[82227]: Stack Trace:
Nov 14 14:45:08 myhost teleport[82227]:         github.com/gravitational/teleport/lib/srv/term.go:268 github.com/gravitational/teleport/lib/srv.(*terminal).WaitForChild
Nov 14 14:45:08 myhost teleport[82227]:         github.com/gravitational/teleport/lib/srv/sess.go:1347 github.com/gravitational/teleport/lib/srv.(*session).startInteractive
Nov 14 14:45:08 myhost teleport[82227]:         github.com/gravitational/teleport/lib/srv/sess.go:389 github.com/gravitational/teleport/lib/srv.(*SessionRegistry).OpenSession
Nov 14 14:45:08 myhost teleport[82227]:         github.com/gravitational/teleport/lib/srv/termhandlers.go:126 github.com/gravitational/teleport/lib/srv.(*TermHandlers).HandleShell
Nov 14 14:45:08 myhost teleport[82227]:         github.com/gravitational/teleport/lib/srv/regular/sshserver.go:1921 github.com/gravitational/teleport/lib/srv/regular.(*Server).dispatch
Nov 14 14:45:08 myhost teleport[82227]:         github.com/gravitational/teleport/lib/srv/regular/sshserver.go:1804 github.com/gravitational/teleport/lib/srv/regular.(*Server).handleSessionRequests
Nov 14 14:45:08 myhost teleport[82227]:         github.com/gravitational/teleport/lib/srv/regular/sshserver.go:1571 github.com/gravitational/teleport/lib/srv/regular.(*Server).HandleNewChan.func1
Nov 14 14:45:08 myhost teleport[82227]:         runtime/asm_amd64.s:1695 runtime.goexit
Nov 14 14:45:08 myhost teleport[82227]: User Message: timed out waiting for continue signal] session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:1348
Nov 14 14:45:08 myhost teleport[82227]: 2024-11-14T14:45:08-07:00 INFO [SESSION:N] Closing session session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:957
Nov 14 14:45:08 myhost teleport[82227]: 2024-11-14T14:45:08-07:00 INFO [SESSION:N] Closing party ca8859c3-b1c2-5b73-5d96-d9ba4aeb8f8a srv/sess.go:2155
Nov 14 14:45:08 myhost teleport[82227]: 2024-11-14T14:45:08-07:00 INFO [SESSION:N] Removing party ServerContext(203.0.113.183:64776->192.168.168.191:57822, user=jeff, id=3) party(id=ca8859c3-b1c2-5b73-5d96-d9ba4aeb8f8a) from session session_id:1027580d-efd5-2796-e821-85433b475b94 srv/sess.go:1579

Strace analysis

When reviewing the behavior of both versions of Teleport via strace, there were a few subtle differences in which calls were made. The pid that actually launches the otp_banner.sh is cloned in both versions.

click to expand
mmap(NULL, 242344, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 21, 0) = 0x7fc978625000
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fca01a3fbd0) = 78625
set_robust_list(0x7fca01a3fbe0, 24) = 0
close(16 <unfinished ...>
<... close resumed>) = 0
dup2(20, 1 <unfinished ...>
<... dup2 resumed>) = 1
dup2(1, 2)        = 2
pipe2([16, 21], 0) = 0
close(21)         = 0
dup2(16, 0)       = 0
close(16)         = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=8*1024, rlim_max=8*1024}) = 0

This is an strace snippet from the non-working version. It is nearly identical to the working version:

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f8aedb42a10) = 77955
set_robust_list(0x7f8aedb42a20, 24) = 0
close(11 <unfinished ...>
<... close resumed>) = 0
dup2(12, 1)       = 1
dup2(1, 2)        = 2
pipe2([11, 13], 0) = 0
close(13)         = 0
dup2(11, 0)       = 0
close(11)         = 0
prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0

The new version has that mmap call that seems to be absent in the old version. The clone options are also slightly different.

I think there may be a change in how go and/or cgo handles go threads and loading libraries, which leads to the callback never being processed for this type of PAM message.

@programmerq programmerq added bug c-co Internal Customer Reference PAM Label related to Pluggable Authentication Module (PAM) Submethod. ssh labels Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c-co Internal Customer Reference PAM Label related to Pluggable Authentication Module (PAM) Submethod. ssh
Projects
None yet
Development

No branches or pull requests

1 participant