Token / LLM call monitoring #12

pieterjanvc · 2024-04-17T12:12:34Z

Llamaindex has options to count tokens which might be interesting to implement in parts of the app to monitor how much data is sent / received from the LLM and help estimate costs / usage

lee-t · 2024-06-27T14:37:46Z

OpenAI-API uses TikToken library for tokenization of strings as part of their api call. The token_count has a set of functions that specifically uses the same library to count tokens in arbitrary strings in python.

for example: a function to count tokens in a chat prompt

from token_count import TokenCount

async def run_prompt(placeholder, meta, prompt, model):
    tc = TokenCount(model_name="gpt-3.5-turbo")
    start = time.time()
    stream = await client.chat.completions.create(
        model=model,
        messages=[{"role": "system", "content": "You are a helpful assistant."},
                  {"role": "user", "content": prompt},],
        stream=True
    )
    streamed_text = ""
    async for chunk in stream:
        chunk_content = chunk.choices[0].delta.content
        if chunk_content is not None:
            streamed_text = streamed_text + chunk_content
            placeholder.write(streamed_text)
            end = time.time()
            time_taken = end-start
            tokens = tc.num_tokens_from_string(streamed_text)
            meta.info(f"""**Duration: :green[{time_taken:.2f} secs]**
            **Eval count: :green[{tokens} tokens]**
            **Eval rate: :green[{tokens / time_taken:.2f} tokens/s]**
            """)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token / LLM call monitoring #12

Token / LLM call monitoring #12

pieterjanvc commented Apr 17, 2024

lee-t commented Jun 27, 2024 •

edited

Loading

Token / LLM call monitoring #12

Token / LLM call monitoring #12

Comments

pieterjanvc commented Apr 17, 2024

lee-t commented Jun 27, 2024 • edited Loading

lee-t commented Jun 27, 2024 •

edited

Loading