Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Adding Logging #33

Closed
wants to merge 78 commits into from
Closed

WIP: Adding Logging #33

wants to merge 78 commits into from

Conversation

mvanniasingheTT
Copy link
Contributor

This WIP PR looks to add logging of raw stats. Currently tested with offline inference. I have the logger class in vllm-tt-metal-llama3-70b/src/logging_utils.py but I also have it under tests right now so you can see how it is used. Any feedback would be great thank you!

mvanniasingheTT and others added 30 commits October 22, 2024 17:52
@tstescoTT
Copy link
Contributor

tstescoTT commented Nov 13, 2024

If there is no decode in the generation, the logging will error out:

Traceback (most recent call last):                                                                                                                                                                                                               │····
  File "benchmark_vllm_offline_inference.py", line 207, in <module>                                                                                                                                                                              │····
    run_inference(                                                                                                                                                                                                                               │····
  File "benchmark_vllm_offline_inference.py", line 116, in run_inference                                                                                                                                                                         │····
    run_inference_perf(llm, prompt_token_ids, sampling_params)                                                                                                                                                                                   │····
  File "benchmark_vllm_offline_inference.py", line 139, in run_inference_perf                                                                                                                                                                    │····
    generate_tokens(                                                                                                                                                                                                                             │····
  File "benchmark_vllm_offline_inference.py", line 168, in generate_tokens                                                                                                                                                                       │····
    outputs = llm.generate(prompts, sampling_params, prompt_token_ids)                                                                                                                                                                           │····
  File "/home/user/vllm/vllm/utils.py", line 1073, in inner                                                                                                                                                                                      │····
    return fn(*args, **kwargs)                                                                                                                                                                                                                   │····
  File "/home/user/vllm/vllm/entrypoints/llm.py", line 353, in generate                                                                                                                                                                          │····
    outputs = self._run_engine(use_tqdm=use_tqdm)                                                                                                                                                                                                │····
  File "/home/user/vllm/vllm/entrypoints/llm.py", line 879, in _run_engine                                                                                                                                                                       │····
    step_outputs = self.llm_engine.step()                                                                                                                                                                                                        │····
  File "/home/user/vllm/vllm/engine/llm_engine.py", line 1454, in step                                                                                                                                                                           │····
    self.do_log_stats(scheduler_outputs, outputs)                                                                                                                                                                                                │····
  File "/home/user/vllm/vllm/engine/llm_engine.py", line 1562, in do_log_stats                                                                                                                                                                   │····
    logger.log(stats)                                                                                                                                                                                                                            │····
  File "/home/user/tests/mock_vllm_model.py", line 428, in log                                                                                                                                                                                   │····
    self._write_to_json(stats)                                                                                                                                                                                                                   │····
  File "/home/user/tests/mock_vllm_model.py", line 458, in _write_to_json                                                                                                                                                                        │····
    data["time to first token"][                                                                                                                                                                                                                 │····
KeyError: 'Inference num:1'  

Can you make the logging handle this case?

to reproduce,

python examples/offline_inference_tt.py --measure_perf --max_seqs_in_batch 32 --perf_prompt_len 128 --max_tokens 1 --greedy_sampling

@tstescoTT
Copy link
Contributor

tstescoTT commented Nov 14, 2024

Could you remove test/data and add the .jsonl output to .gitignore?

@mvanniasingheTT mvanniasingheTT deleted the mvanniasinghe/logging branch November 19, 2024 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants