Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add doc to explain multithreading #1154

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
179 changes: 179 additions & 0 deletions content/develop/concepts/app-design/multithreading.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
---
title: Threading in Streamlit
slug: /develop/concepts/design/multithreading
---

# Multithreading in Streamlit

Multithreading is a common technique to improve the efficiency of computer programs. It's a way for processors to multitask. Streamlit uses threads within its architecture, which can make it difficult for app developers to include their own multithreaded processes. Streamlit does not officially support multithreading in app code, but this guide provides information on how it can be accomplished.

## Prerequisites

- You should have a basic understanding of Streamlit's [architecture](/develop/concepts/architecture/architecture).

## Threads created by Streamlit

Streamlit creates two types of threads in Python:

- The **server thread** runs the Tornado web (HTTP + WebSocket) server.
- A **script thread** runs page code — one thread for each script run in a session.

When a user connects to your app, this creates a new session and runs a script thread to initialize the app for that user. As the script thread runs, it renders elements in the user's browser tab and reports state back to the server. When the user interacts with the app, another script thread runs, re-rendering the elements in the browser tab and updating state on the server.

This is a simplifed illustration to show how Streamlit works:

![Each user session uses script threads to communicate between the user's front end and the Streamlit server.](/images/concepts/Streamlit-threading.svg)

## `streamlit.errors.NoSessionContext`

Many Streamlit commands, including `st.session_state`, expect to be called from a script thread. When Streamlit is running as expected, such commands use the `ScriptRunContext` attached to the script thread to ensure they work within the intended session and update the correct user's view. When those Streamlit commands can't find any `ScriptRunContext`, they raise a `streamlit.errors.NoSessionContext` exception. Depending on your logger settings, you may also see a console message identifying a thread by name and warning, "missing ScriptRunContext!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this in fact be "All Streamlit commands?" @lukasmasuch

Copy link
Collaborator

@lukasmasuch lukasmasuch Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not 100% sure. I believe a lot of commands will not raise a NoSessionContext if there is no ScriptRunContext. This is to support bare execution (execute a Streamlit app script with pure python). But calling these commands with the context won't do anything. But I believe all Streamlit commands require a ScriptRunContext to be fully functional.


## Creating custom threads

When you work with IO-heavy operations like remote query or data loading, you may need to mitigate delays. A general programming strategy is to create threads and let them work concurrently. However, if you do this in a Streamlit app, these custom threads may have difficulty interacting with your Streamlit server.

This section introduces two patterns to let you create custom threads in your Streamlit app. These are only patterns to provide a starting point rather than complete solutions.

### Option 1: Do not use Streamlit commands within a custom thread

If you don't call Streamlit commands from a custom thread, you can avoid the problem entirely. Luckily Python threading provides ways to start a thread and collect its result from another thread.

In the following example, five custom threads are created from the script thread. After the threads are finished running, their results are displayed in the app.

```python
import streamlit as st
import time
from threading import Thread


class WorkerThread(Thread):
def __init__(self, delay):
super().__init__()
self.delay = delay
self.return_value = None

def run(self):
start_time = time.time()
time.sleep(self.delay)
end_time = time.time()
self.return_value = f"start: {start_time}, end: {end_time}"


delays = [5, 4, 3, 2, 1]
threads = [WorkerThread(delay) for delay in delays]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
for i, thread in enumerate(threads):
st.header(f"Thread {i}")
st.write(thread.return_value)

st.button("Rerun")
```

<Cloud name="doc-multithreading-no-st-commands-batched" height="700px" />

If you want to display results in your app as various custom threads finish running, use containers. In the following example, five custom threads are created similarly to the previous example. However, five containers are initialized before running the custom threads and a `while` loop is used to display results as they become available. Since the Streamlit `write` command is called outside of the custom threads, this does not raise an exception.

```python
import streamlit as st
import time
from threading import Thread


class WorkerThread(Thread):
def __init__(self, delay):
super().__init__()
self.delay = delay
self.return_value = None

def run(self):
start_time = time.time()
time.sleep(self.delay)
end_time = time.time()
self.return_value = f"start: {start_time}, end: {end_time}"


delays = [5, 4, 3, 2, 1]
result_containers = []
for i, delay in enumerate(delays):
st.header(f"Thread {i}")
result_containers.append(st.container())

threads = [WorkerThread(delay) for delay in delays]
for thread in threads:
thread.start()
thread_lives = [True] * len(threads)

while any(thread_lives):
for i, thread in enumerate(threads):
if thread_lives[i] and not thread.is_alive():
result_containers[i].write(thread.return_value)
thread_lives[i] = False
time.sleep(0.5)

for thread in threads:
thread.join()

st.button("Rerun")
```

<Cloud name="doc-multithreading-no-st-commands-iterative" height="700px" />

### Option 2: Expose `ScriptRunContext` to the thread

If you want to call Streamlit commands from within your custom threads, you must attach the correct `ScriptRunContext` to the thread.

<Warning>

- This is not officially supported and may change in a future version of Streamlit.
- This may not work with all Streamlit commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know exhaustively which commands these might be, or should this just be a generic warning to cover the possibilities?

- Ensure custom threads do not outlive the script thread owning the `ScriptRunContext`. Leaking of `ScriptRunContext` may cause security vulnerabilities, fatal errors, or unexpected behavior.

</Warning>

In the following example, a custom thread with `ScriptRunContext` attached can call `st.write` without a warning.

```python
import streamlit as st
from streamlit.runtime.scriptrunner import add_script_run_ctx, get_script_run_ctx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Before we release this, we might want to double-check if it might be better to expose get_script_run_ctx and add_script_run_ctx in a slightly less internal namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's any hope of doing so relatively quickly/easily, that'd be great!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's relatively quickly/easily. Probably needs some discussion with the product (cc @jrieke). I think in the best case, we can semi-officially expose it to a stable namespace (e.g., streamlit.multithreading) and also put some metrics tracking on these methods. But that probably takes a bit of decision time. It is probably fine in the meantime to add a warning that these methods are purely internal and can change/break with any version update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a warning included in the section: https://deploy-preview-1154--streamlit-docs.netlify.app/develop/concepts/design/multithreading#option-2-expose-scriptruncontext-to-the-thread

I'll bring this up in office hours to see if there are any other concerns before publishing. My biggest question is if I should include a little more careful handling of the exposed thread context. The warning states that custom threads should not outlive the script thread from whence they came, but the example doesn't actually do enough to prevent that since it does nothing to handle an interrupted script run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely needs a bit of discussion and thought about how we do that API. I know Joshua wanted to do it but I never really deeply looked into multithreading so far, so it doesn't really make sense to do something ad-hoc right now. I think it's fine mentioning the internal API in a guide but we should definitely put in a disclaimer that it's an internal, unstable API and will change in the future.

import time
from threading import Thread


class WorkerThread(Thread):
def __init__(self, delay, target):
super().__init__()
self.delay = delay
self.target = target

def run(self):
# runs in custom thread, but can call Streamlit APIs
start_time = time.time()
time.sleep(self.delay)
end_time = time.time()
self.target.write(f"start: {start_time}, end: {end_time}")


delays = [5, 4, 3, 2, 1]
result_containers = []
for i, delay in enumerate(delays):
st.header(f"Thread {i}")
result_containers.append(st.container())

threads = [
WorkerThread(delay, container)
for delay, container in zip(delays, result_containers)
]
for thread in threads:
add_script_run_ctx(thread, get_script_run_ctx())
thread.start()

for thread in threads:
thread.join()
Comment on lines +159 to +174
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Commenting for engineering review later in the week when people are back from the holidays):

Should we be storing the threads in Session State and running a check for threads at the top of the script? And/or disabling widgets when threading? Adding try-except to the Streamlit commands in the threads? Although the script works like this, if a user reruns the app before the page finishes loading, that'd be an issue. Hence this is very fragile, right?


st.button("Rerun")
```

<Cloud name="doc-multithreading-expose-context" height="700px" />
2 changes: 2 additions & 0 deletions content/menu.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ site_menu:
url: /develop/concepts/design/buttons
- category: Develop / Concepts / App design / Dataframes
url: /develop/concepts/design/dataframes
- category: Develop / Concepts / App design / Multithreading
url: /develop/concepts/design/multithreading
- category: Develop / Concepts / App design / Using custom classes
url: /develop/concepts/design/custom-classes
- category: Develop / Concepts / App design / Working with timezones
Expand Down
Loading