Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token / LLM call monitoring #12

Open
pieterjanvc opened this issue Apr 17, 2024 · 1 comment
Open

Token / LLM call monitoring #12

pieterjanvc opened this issue Apr 17, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@pieterjanvc
Copy link
Owner

Llamaindex has options to count tokens which might be interesting to implement in parts of the app to monitor how much data is sent / received from the LLM and help estimate costs / usage

@pieterjanvc pieterjanvc added duplicate This issue or pull request already exists enhancement New feature or request incomplete This feature is not complete yet and might cause issues or miss functionality and removed duplicate This issue or pull request already exists incomplete This feature is not complete yet and might cause issues or miss functionality labels Apr 17, 2024
@lee-t
Copy link
Collaborator

lee-t commented Jun 27, 2024

OpenAI-API uses TikToken library for tokenization of strings as part of their api call. The token_count has a set of functions that specifically uses the same library to count tokens in arbitrary strings in python.

for example: a function to count tokens in a chat prompt

from token_count import TokenCount

async def run_prompt(placeholder, meta, prompt, model):
    tc = TokenCount(model_name="gpt-3.5-turbo")
    start = time.time()
    stream = await client.chat.completions.create(
        model=model,
        messages=[{"role": "system", "content": "You are a helpful assistant."},
                  {"role": "user", "content": prompt},],
        stream=True
    )
    streamed_text = ""
    async for chunk in stream:
        chunk_content = chunk.choices[0].delta.content
        if chunk_content is not None:
            streamed_text = streamed_text + chunk_content
            placeholder.write(streamed_text)
            end = time.time()
            time_taken = end-start
            tokens = tc.num_tokens_from_string(streamed_text)
            meta.info(f"""**Duration: :green[{time_taken:.2f} secs]**
            **Eval count: :green[{tokens} tokens]**
            **Eval rate: :green[{tokens / time_taken:.2f} tokens/s]**
            """)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants