Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R: Hang possible when building Variables pane #1419

Closed
jmcphers opened this issue Sep 26, 2023 · 3 comments
Closed

R: Hang possible when building Variables pane #1419

jmcphers opened this issue Sep 26, 2023 · 3 comments
Assignees
Labels
area: kernels Issues related to Jupyter kernels and LSP servers bug Something isn't working lang: r

Comments

@jmcphers
Copy link
Collaborator

jmcphers commented Sep 26, 2023

Positron Version

2023.09.0-3750

Steps to reproduce the issue:

Create an R file with the following contents (via @t-kalinowski):

reticulate::virtualenv_create("r-tensorflow", "3.8", packages = c("tensorflow", "keras_core"))

Sys.setenv(RETICULATE_PYTHON_ENV = "r-tensorflow")

k <- reticulate::import("keras_core")

Run the first line to set up the environment, then restart R and run the second two.

The R interpreter hangs; now you can't execute code or do anything else with R.

This happens because the Environment pane is waiting on Python when it calls format on a vector.

  * frame #0: 0x00000001821ab750 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001821e8574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x000000013c4e34f0 libpython3.8.dylib`take_gil + 140
    frame #3: 0x000000013c4e3a6c libpython3.8.dylib`PyEval_RestoreThread + 56
    frame #4: 0x000000013c52407c libpython3.8.dylib`PyGILState_Ensure + 112
    frame #5: 0x0000000124d1202c reticulate.so`_reticulate_py_is_null_xptr + 84
    frame #6: 0x0000000106ad8104 libR.dylib`R_doDotCall + 1548
    frame #7: 0x0000000106b15834 libR.dylib`bcEval + 96840
... (many frames omitted) ...
    frame #77: 0x0000000106afda70 libR.dylib`Rf_eval + 1312
    frame #78: 0x0000000103b74de8 ark`harp::exec::RFunction::call::h6e179eee5c3eaa0e(self=0x00000001700d4030) at exec.rs:163:34
    frame #79: 0x0000000103b8c174 ark`harp::vector::formatted_vector::FormattedVector::new::heaa13d4c93d603f1(vector=0x000000013f918630) at formatted_vector.rs:73:37

We don't know for sure why this hangs in libpython, but the suspicion is that it is because reticulate doesn't know that we are running R on a thread other than the main thread (see #1420).

What did you expect to happen?

It shouldn't be possible for a badly behaved formatter to freeze the entire interpreter. We could do any/all of:

See also #507; it'd really have been helpful when investigating this to disable the Environment pane.

@lionel-
Copy link
Contributor

lionel- commented Nov 14, 2023

Since posit-dev/ark#136 we now run all tasks with a timeout of 5 seconds. After this delay, the kernel logs backtraces of the main and calling threads and panics. This timeout is meant for unexpected delays. The panicking behaviour is similar to how we also panic in case of unexpected error/longjump.

To avoid these panics, Ark routines need to be defensive by protecting potentially failing code and handling these failures. Similarly we need some mechanism to orderly interrupt operations that can potentially take too much time. I think this could work as follows:

  • Maintain a global atomic variable to signal our interrupt handler that we want to time out a task.

  • If a timeout is detected, flip the atomic from the calling thread and raise a SIGINT.

  • Our SIGINT handler will have to handle three cases:

    1. Normal user interrupt.
    2. Timeout interrupt.
    3. Normal user interrupt raised while handling a timeout interrupt.

    When a task is running, interrupts are suspended through R_interrupts_suspended and handled after the task or a frontend method has finished running. In order to interrupt a task we'll need to unsuspend interrupts, set R_interrupts_pending to true, and wait for R to convert this flag to an interrupt longjump.

    In scenario 3 the handler might be called back by an incoming user interrupt while the timeout interrupt is being processed. It will need to keep track of that to reproduce the interrupt suspension behaviour once we have successfully interrupted our task.

  • We catch R's interrupt longjump at the timeout site and restore state (R_interrupts_suspended to true, and R_interrupts_pending according to whether a user interrupt was signaled in the meantime).

One big caveat with this approach is that we are using a longjump to interrupt the current computation and it would be UB to interrupt a Rust stack this way. In the best case we'll leak memory, in the worst case we'll cause instability in Ark. This means that any computation that might be expected to take too much time (e.g. because it needs to run foreign code) will have to be done in C rather than in Rust. Again we are facing the question of the adequate boundaries between Rust and C code within Ark, with a trade-off between ease of development and safety.

@jmcphers jmcphers added this to the Release Candidate milestone Feb 20, 2024
@wesm wesm added lang: r bug Something isn't working labels Feb 29, 2024
@lionel- lionel- added the area: kernels Issues related to Jupyter kernels and LSP servers label May 16, 2024
@lionel- lionel- changed the title R: Hang possible when building Environment pane R: Hang possible when building Variables pane May 21, 2024
@dfalbel
Copy link
Contributor

dfalbel commented Nov 15, 2024

I can no longer reproduce this. Even though we didn't target this specifically, we have made sevaral changes to the variables pane and how reticulate interacts with Positron.

@lionel-
Copy link
Contributor

lionel- commented Nov 15, 2024

@dfalbel I think this issue was more about making it impossible for the variable pane to hang via execution of R code (i.e. adding a timeout to r_task() that interrupts the R stack).

We can use #2223 as tracking issue.

@lionel- lionel- closed this as completed Nov 15, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area: kernels Issues related to Jupyter kernels and LSP servers bug Something isn't working lang: r
Projects
None yet
Development

No branches or pull requests

4 participants