Skip to content

Commit

Permalink
Litellm dev 2024 12 19 p3 (#7322)
Browse files Browse the repository at this point in the history
* fix(utils.py): remove unsupported optional params (if drop_params=True) before passing into map openai params

Fixes #7242

* test: new test for langfuse prompt management hook

Addresses #3893 (comment)

* feat(main.py): add 'get_chat_completion_prompt' customlogger hook

allows for langfuse prompt management

Addresses #3893 (comment)

* feat(langfuse_prompt_management.py): working e2e langfuse prompt management

works with `langfuse/` route

* feat(main.py): initial tracing for dynamic langfuse params

allows admin to specify langfuse keys by model in model_list

* feat(main.py): support passing langfuse credentials dynamically

* fix(langfuse_prompt_management.py): create langfuse client based on dynamic callback params

allows dynamic langfuse params to work

* fix: fix linting errors

* docs(prompt_management.md): refactor docs for sdk + proxy prompt management tutorial

* docs(prompt_management.md): cleanup doc

* docs: cleanup topnav

* docs(prompt_management.md): update docs to be easier to use

* fix: remove unused imports

* docs(prompt_management.md): add architectural overview doc

* fix(litellm_logging.py): fix dynamic param passing

* fix(langfuse_prompt_management.py): fix linting errors

* fix: fix linting errors

* fix: use typing_extensions for typealias to ensure python3.8 compatibility

* test: use stream_options in test to account for tiktoken diff

* fix: improve import error message, and check run test earlier
  • Loading branch information
krrishdholakia authored Dec 20, 2024
1 parent 2c36f25 commit 27a4d08
Show file tree
Hide file tree
Showing 17 changed files with 631 additions and 243 deletions.
195 changes: 162 additions & 33 deletions docs/my-website/docs/proxy/prompt_management.md
Original file line number Diff line number Diff line change
@@ -1,83 +1,212 @@
import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Prompt Management

LiteLLM supports using [Langfuse](https://langfuse.com/docs/prompts/get-started) for prompt management on the proxy.
Run experiments or change the specific model (e.g. from gpt-4o to gpt4o-mini finetune) from your prompt management tool (e.g. Langfuse) instead of making changes in the application.

Supported Integrations:
- [Langfuse](https://langfuse.com/docs/prompts/get-started)

## Quick Start

1. Add Langfuse as a 'callback' in your config.yaml

<Tabs>

<TabItem value="sdk" label="SDK">

```python
import os
import litellm

os.environ["LANGFUSE_PUBLIC_KEY"] = "public_key" # [OPTIONAL] set here or in `.completion`
os.environ["LANGFUSE_SECRET_KEY"] = "secret_key" # [OPTIONAL] set here or in `.completion`

litellm.set_verbose = True # see raw request to provider

resp = litellm.completion(
model="langfuse/gpt-3.5-turbo",
prompt_id="test-chat-prompt",
prompt_variables={"user_message": "this is used"}, # [OPTIONAL]
messages=[{"role": "user", "content": "<IGNORED>"}],
)
```



</TabItem>
<TabItem value="proxy" label="PROXY">

1. Setup config.yaml

```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE

litellm_settings:
callbacks: ["langfuse"] # 👈 KEY CHANGE
model: langfuse/gpt-3.5-turbo
prompt_id: "<langfuse_prompt_id>"
api_key: os.environ/OPENAI_API_KEY
```
2. Start the proxy
```bash
litellm-proxy --config config.yaml
litellm --config config.yaml --detailed_debug
```

3. Test it!

<Tabs>
<TabItem value="curl" label="CURL">

```bash
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"model": "gpt-4",
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "THIS WILL BE IGNORED"
}
],
"metadata": {
"langfuse_prompt_id": "value",
"langfuse_prompt_variables": { # [OPTIONAL]
"key": "value"
}
"prompt_variables": {
"key": "this is used"
}
}'
```
</TabItem>
<TabItem value="OpenAI Python SDK" label="OpenAI Python SDK">

```python
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
extra_body={
"prompt_variables": { # [OPTIONAL]
"key": "this is used"
}
}
)

print(response)
```

</TabItem>
</Tabs>

</TabItem>
</Tabs>


**Expected Logs:**

```
POST Request Sent from LiteLLM:
curl -X POST \
https://api.openai.com/v1/ \
-d '{'model': 'gpt-3.5-turbo', 'messages': <YOUR LANGFUSE PROMPT TEMPLATE>}'
```

## How to set model

### Set the model on LiteLLM

## What is 'langfuse_prompt_id'?
You can do `langfuse/<litellm_model_name>`

- `langfuse_prompt_id`: The ID of the prompt that will be used for the request.
<Tabs>
<TabItem value="sdk" label="SDK">

```python
litellm.completion(
model="langfuse/gpt-3.5-turbo", # or `langfuse/anthropic/claude-3-5-sonnet`
...
)
```

</TabItem>
<TabItem value="proxy" label="PROXY">

```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: langfuse/gpt-3.5-turbo # OR langfuse/anthropic/claude-3-5-sonnet
prompt_id: <langfuse_prompt_id>
api_key: os.environ/OPENAI_API_KEY
```
</TabItem>
</Tabs>
### Set the model in Langfuse
If the model is specified in the Langfuse config, it will be used.
<Image img={require('../../img/langfuse_prompt_management_model_config.png')} />
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/chatgpt-v-2
api_key: os.environ/AZURE_API_KEY
api_base: os.environ/AZURE_API_BASE
```
## What is 'prompt_variables'?
- `prompt_variables`: A dictionary of variables that will be used to replace parts of the prompt.



## What is 'prompt_id'?

- `prompt_id`: The ID of the prompt that will be used for the request.

<Image img={require('../../img/langfuse_prompt_id.png')} />

## What will the formatted prompt look like?

### `/chat/completions` messages

The message will be added to the start of the prompt.
The `messages` field sent in by the client is ignored.

- if the Langfuse prompt is a list, it will be added to the start of the messages list (assuming it's an OpenAI compatible message).
The Langfuse prompt will replace the `messages` field.

- if the Langfuse prompt is a string, it will be added as a system message.
To replace parts of the prompt, use the `prompt_variables` field. [See how prompt variables are used](https://github.com/BerriAI/litellm/blob/017f83d038f85f93202a083cf334de3544a3af01/litellm/integrations/langfuse/langfuse_prompt_management.py#L127)

```python
if isinstance(compiled_prompt, list):
data["messages"] = compiled_prompt + data["messages"]
else:
data["messages"] = [
{"role": "system", "content": compiled_prompt}
] + data["messages"]
```
If the Langfuse prompt is a string, it will be sent as a user message (not all providers support system messages).

### `/completions` messages
If the Langfuse prompt is a list, it will be sent as is (Langfuse chat prompts are OpenAI compatible).

The message will be added to the start of the prompt.
## Architectural Overview

```python
data["prompt"] = compiled_prompt + "\n" + data["prompt"]
```
<Image img={require('../../img/prompt_management_architecture_doc.png')} />

## API Reference

These are the params you can pass to the `litellm.completion` function in SDK and `litellm_params` in config.yaml

```
prompt_id: str # required
prompt_variables: Optional[dict] # optional
langfuse_public_key: Optional[str] # optional
langfuse_secret: Optional[str] # optional
langfuse_secret_key: Optional[str] # optional
langfuse_host: Optional[str] # optional
```
10 changes: 1 addition & 9 deletions docs/my-website/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -130,15 +130,7 @@ const config = {
href: 'https://discord.com/invite/wuPM9dRgDw',
label: 'Discord',
position: 'right',
},
{
type: 'html',
position: 'right',
value:
`<a href=# class=navbar__link data-fr-widget>
I'm Confused
</a>`
},
}
],
},
footer: {
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
41 changes: 23 additions & 18 deletions docs/my-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,6 @@ const sidebars = {
"oidc"
]
},
"proxy/prompt_management",
"proxy/caching",
"proxy/call_hooks",
"proxy/rules",
Expand Down Expand Up @@ -228,6 +227,7 @@ const sidebars = {
"completion/batching",
"completion/mock_requests",
"completion/reliable_completions",
'tutorials/litellm_proxy_aporia',

]
},
Expand Down Expand Up @@ -309,8 +309,29 @@ const sidebars = {
label: "LangChain, LlamaIndex, Instructor Integration",
items: ["langchain/langchain", "tutorials/instructor"],
},
{
type: "category",
label: "Tutorials",
items: [

'tutorials/azure_openai',
'tutorials/instructor',
"tutorials/gradio_integration",
"tutorials/huggingface_codellama",
"tutorials/huggingface_tutorial",
"tutorials/TogetherAI_liteLLM",
"tutorials/finetuned_chat_gpt",
"tutorials/text_completion",
"tutorials/first_playground",
"tutorials/model_fallbacks",
],
},
],
},
{
type: "doc",
id: "proxy/prompt_management"
},
{
type: "category",
label: "Load Testing",
Expand Down Expand Up @@ -362,23 +383,7 @@ const sidebars = {
"observability/opik_integration",
],
},
{
type: "category",
label: "Tutorials",
items: [
'tutorials/litellm_proxy_aporia',
'tutorials/azure_openai',
'tutorials/instructor',
"tutorials/gradio_integration",
"tutorials/huggingface_codellama",
"tutorials/huggingface_tutorial",
"tutorials/TogetherAI_liteLLM",
"tutorials/finetuned_chat_gpt",
"tutorials/text_completion",
"tutorials/first_playground",
"tutorials/model_fallbacks",
],
},

{
type: "category",
label: "Extras",
Expand Down
21 changes: 21 additions & 0 deletions litellm/integrations/custom_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
EmbeddingResponse,
ImageResponse,
ModelResponse,
StandardCallbackDynamicParams,
StandardLoggingPayload,
)

Expand Down Expand Up @@ -60,6 +61,26 @@ async def async_log_success_event(self, kwargs, response_obj, start_time, end_ti
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
pass

#### PROMPT MANAGEMENT HOOKS ####

def get_chat_completion_prompt(
self,
model: str,
messages: List[AllMessageValues],
non_default_params: dict,
headers: dict,
prompt_id: str,
prompt_variables: Optional[dict],
dynamic_callback_params: StandardCallbackDynamicParams,
) -> Tuple[str, List[AllMessageValues], dict]:
"""
Returns:
- model: str - the model to use (can be pulled from prompt management tool)
- messages: List[AllMessageValues] - the messages to use (can be pulled from prompt management tool)
- non_default_params: dict - update with any optional params (e.g. temperature, max_tokens, etc.) to use (can be pulled from prompt management tool)
"""
return model, messages, non_default_params

#### PRE-CALL CHECKS - router/proxy only ####
"""
Allows usage-based-routing-v2 to run pre-call rpm checks within the picked deployment's semaphore (concurrency-safe tpm/rpm checks).
Expand Down
Loading

0 comments on commit 27a4d08

Please sign in to comment.