WIP: Adding Logging #33

mvanniasingheTT · 2024-11-06T21:38:23Z

This WIP PR looks to add logging of raw stats. Currently tested with offline inference. I have the logger class in vllm-tt-metal-llama3-70b/src/logging_utils.py but I also have it under tests right now so you can see how it is used. Any feedback would be great thank you!

…e/mock_model

…t/tt-inference-server into mvanniasinghe/logging

tstescoTT · 2024-11-13T03:28:28Z

If there is no decode in the generation, the logging will error out:

Traceback (most recent call last):                                                                                                                                                                                                               │····
  File "benchmark_vllm_offline_inference.py", line 207, in <module>                                                                                                                                                                              │····
    run_inference(                                                                                                                                                                                                                               │····
  File "benchmark_vllm_offline_inference.py", line 116, in run_inference                                                                                                                                                                         │····
    run_inference_perf(llm, prompt_token_ids, sampling_params)                                                                                                                                                                                   │····
  File "benchmark_vllm_offline_inference.py", line 139, in run_inference_perf                                                                                                                                                                    │····
    generate_tokens(                                                                                                                                                                                                                             │····
  File "benchmark_vllm_offline_inference.py", line 168, in generate_tokens                                                                                                                                                                       │····
    outputs = llm.generate(prompts, sampling_params, prompt_token_ids)                                                                                                                                                                           │····
  File "/home/user/vllm/vllm/utils.py", line 1073, in inner                                                                                                                                                                                      │····
    return fn(*args, **kwargs)                                                                                                                                                                                                                   │····
  File "/home/user/vllm/vllm/entrypoints/llm.py", line 353, in generate                                                                                                                                                                          │····
    outputs = self._run_engine(use_tqdm=use_tqdm)                                                                                                                                                                                                │····
  File "/home/user/vllm/vllm/entrypoints/llm.py", line 879, in _run_engine                                                                                                                                                                       │····
    step_outputs = self.llm_engine.step()                                                                                                                                                                                                        │····
  File "/home/user/vllm/vllm/engine/llm_engine.py", line 1454, in step                                                                                                                                                                           │····
    self.do_log_stats(scheduler_outputs, outputs)                                                                                                                                                                                                │····
  File "/home/user/vllm/vllm/engine/llm_engine.py", line 1562, in do_log_stats                                                                                                                                                                   │····
    logger.log(stats)                                                                                                                                                                                                                            │····
  File "/home/user/tests/mock_vllm_model.py", line 428, in log                                                                                                                                                                                   │····
    self._write_to_json(stats)                                                                                                                                                                                                                   │····
  File "/home/user/tests/mock_vllm_model.py", line 458, in _write_to_json                                                                                                                                                                        │····
    data["time to first token"][                                                                                                                                                                                                                 │····
KeyError: 'Inference num:1'

Can you make the logging handle this case?

to reproduce,

python examples/offline_inference_tt.py --measure_perf --max_seqs_in_batch 32 --perf_prompt_len 128 --max_tokens 1 --greedy_sampling

vllm-tt-metal-llama3-70b/src/logging_utils.py

vllm-tt-metal-llama3-70b/.env.llama31_8b

tstescoTT · 2024-11-14T00:16:55Z

Could you remove test/data and add the .jsonl output to .gitignore?

mvanniasingheTT and others added 30 commits October 22, 2024 17:52

added mock model

36f5b77

adding vLLM dockerfile

ee9be29

adding label GHCR and sed import edit for vllm server_example_tt.py

efe648b

adding evals instructions and run script

0e5f431

move vllm llama 3.1 70b implementation to top level model impl dir

ffdd77a

update eval instructions

5ef0d64

adding llama 3.1 70b benchmarking instructions

33340dc

adding dir for locust

50626e4

add doc link for vllm setup

11c604a

adding pre-commit with ruff linting and formatting

45f544d

add pre-commit instructions

64281b0

add GHCR repo connection label to Dockerfile

f9b4fb3

created + add mock model and mock offline inference w/ patches

3efd4c6

removed unneeded imports from mock model

f2b8423

Merge branch 'tstesco/evals-benchmarking-structure' into mvanniasingh…

438e3ae

…e/mock_model

adding vLLM dockerfile

b687ace

adding label GHCR and sed import edit for vllm server_example_tt.py

ed34e60

adding evals instructions and run script

fd0c87d

move vllm llama 3.1 70b implementation to top level model impl dir

4d0cb6b

update eval instructions

a793d35

adding llama 3.1 70b benchmarking instructions

7ed83b1

adding dir for locust

0e19de6

add doc link for vllm setup

4f7a328

adding pre-commit with ruff linting and formatting

b40c536

add pre-commit instructions

488c19c

add GHCR repo connection label to Dockerfile

ee888d9

update python path to include vllm

5a0a124

adding requirements-dev.txt

6c35b17

adding EOF newlines

022cd77

add ifeval to run_evals.sh, set to stream=False

f17a922

mvanniasingheTT added 6 commits October 31, 2024 17:04

update tpot logging - NOTE: mult by num scheduler steps

ce17c50

resovled merge conflcits

f46656e

update to take num scheduler steps from engine args

b2d8393

Merge branch 'mvanniasinghe/logging' of https://github.com/tenstorren…

1b36a4e

…t/tt-inference-server into mvanniasinghe/logging

resolve merge conflict

fd873ab

resolve merge conflict + add logger_utils.py

5a7305c

mvanniasingheTT requested review from milank94 and tstescoTT November 6, 2024 21:38

mvanniasingheTT added 9 commits November 7, 2024 14:18

added online logging

93aaf10

update MQLLMEngine to use scheduler_config to get num steps

c9d1ec1

update variable name rot_idxs_tt

ceb0fea

clean up imports

4c84da4

move new init over to logging utils + comments

c21a854

added jsonl logging - issue with online

4e150a2

remove breakpoint

e3efb5b

added line by line logging for offlien

092fbbf

updated loggin util with jsonl logger

e09b428

mvanniasingheTT added 2 commits November 13, 2024 10:44

added fix for when there is no decode (only generate first token/prefill

ea29384

fix for only decode + better logging online - batch size hard coded

1f9d587

tstescoTT reviewed Nov 14, 2024

View reviewed changes

vllm-tt-metal-llama3-70b/src/logging_utils.py Outdated Show resolved Hide resolved

tstescoTT reviewed Nov 14, 2024

View reviewed changes

vllm-tt-metal-llama3-70b/.env.llama31_8b Outdated Show resolved Hide resolved

mvanniasingheTT added 5 commits November 14, 2024 08:44

update to ignore .jsonl

f9395b4

delete data

70472e7

added .env.llama31_8b

9dbc918

update to use variable batch size - remove hard code

75471e8

update logging utils with changes

49694c0

mvanniasingheTT closed this Nov 18, 2024

mvanniasingheTT deleted the mvanniasinghe/logging branch November 19, 2024 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Adding Logging #33

WIP: Adding Logging #33

mvanniasingheTT commented Nov 6, 2024

tstescoTT commented Nov 13, 2024 •

edited

Loading

tstescoTT commented Nov 14, 2024 •

edited

Loading

WIP: Adding Logging #33

WIP: Adding Logging #33

Conversation

mvanniasingheTT commented Nov 6, 2024

tstescoTT commented Nov 13, 2024 • edited Loading

tstescoTT commented Nov 14, 2024 • edited Loading

tstescoTT commented Nov 13, 2024 •

edited

Loading

tstescoTT commented Nov 14, 2024 •

edited

Loading