Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Supervisor (Kallichore) failed to start in smoke test #5337

Open
jonvanausdeln opened this issue Nov 12, 2024 · 7 comments
Open

New Supervisor (Kallichore) failed to start in smoke test #5337

jonvanausdeln opened this issue Nov 12, 2024 · 7 comments
Assignees
Milestone

Comments

@jonvanausdeln
Copy link
Contributor

jonvanausdeln commented Nov 12, 2024

System details:

Positron and OS details:

Positron: main after kallichore set as default supervisor
Operationg System: Ubuntu 22

Interpreter details:

Describe the issue:

In CI smoke test, the test-explorer test failed due to Kallichore not starting.

Image

See CI Run
And another CI Smoke Run

This is the first time we've seen this error. So not sure how prevalent.

Expected or desired behavior:

Kallichore to start up

Were there any error messages in the UI, Output panel, or Developer Tools console?

@midleman
Copy link
Contributor

I also ran into this error today on Ubuntu, so it appears to not be isolated to Windows.

@jmcphers
Copy link
Collaborator

@midleman Did you see this error interactively, or in a smoke test run? If the latter, could you link it?

(so far I've been unable to reproduce on Windows)

@midleman
Copy link
Contributor

In a smoke test run, but it's on my branch, let me grab the logs ...

@jonvanausdeln
Copy link
Contributor Author

Sorry.. I got confused looking at failures.. this one was on Ubuntu 22 (as the CI link correctly points to) ...ignore the windows part

@jmcphers
Copy link
Collaborator

This scenario passed in my smoke test run with Kallichore enabled so perhaps this is timing related?

Some notes from an initial look:

2024-11-12 14:49:25.336 [info] Positron Kernel Supervisor activated
Starting Kallichore server /home/runner/work/positron/positron/extensions/positron-supervisor/resources/kallichore/kcserver on port 55683
Kallichore 0.1.18 server online with 0 sessions
Kallichore server started in 273ms
...
14:49:32 [INFO] [client r-97ae99a4-0] Websocket closed by client

...

2024-11-12 14:49:38.638 [info] Positron Kernel Supervisor activated
Starting Kallichore server /home/runner/work/positron/positron/extensions/positron-supervisor/resources/kallichore/kcserver on port 51542
Waiting for Kallichore server to start (attempt 1, 589ms)
Waiting for Kallichore server to start (attempt 6, 853ms)
Waiting for Kallichore server to start (attempt 11, 1130ms)
Waiting for Kallichore server to start (attempt 16, 1391ms)
Kallichore server did not start after 1599ms

So what's interesting here is that Kallichore starts successfully the first time, but after a project switch the IDE restarts and Kallichore doesn't.

@jmcphers jmcphers self-assigned this Nov 12, 2024
@jmcphers jmcphers added this to the 2024.12.0 Pre-Release milestone Nov 12, 2024
@Liu-yuansheng

This comment has been minimized.

jmcphers added a commit that referenced this issue Dec 12, 2024
This change improves diagnostics and logging for (hopefully rare) cases
in which the kernel supervisor _itself_ cannot start. This is distinct
from the cases where R or Python can't start. So far we've seen just one
of these in the wild, as a result of running on an unsupported OS.

The approach is to borrow the wrapper script technique from the Jupyter
Adapter (formerly used to invoke the kernels themselves). The wrapper
script acts as a sort of supervisor for the supervisor; it eats the
output of the supervisor process and writes it to a file. If the
supervisor exits unexpectedly at startup, the output file is written to
the log channel, and the user is directed there to view errors.

As an additional benefit, this runs the supervisor under a `bash`
process on Unix-alikes, so any environment variables or configuration
set up in `.bashrc` (etc) will now be available to the supervisor.

Addresses #5611 . 

May help us figure out #5337.

### QA Notes

An easy way to test this is to replace your `kcserver` binary with a
shell script that emits some nonsense and then exits immediately with a
nonzero status code. If you're feeling ambitious, you could also test
this on the OS named in #5611.

Also, did you know that the [longest worm in the
world](https://en.wikipedia.org/wiki/Lineus_longissimus) can reach up to
55 meters? Crazy.
@juliasilge
Copy link
Contributor

A real user has run into this here: #5910

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants