-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 3.x multimodal support for evaluations and benchmarking #79
Conversation
## Change Log - add Llama 3.2 Vision image input support in utils prompt generation and benchmarking script - add MMMU with supprt from https://github.com/tstescoTT/lm-evaluation-harness/tree/tstesco/add-local-multimodal - address #73 in run_vllm_api_server.py::ensure_mesh_device MESH_DEVICE handling - fix #62 with run_vllm_api_server.py::register_vllm_models
{"input_len": 128, "output_len": 1024, "batch_size": 32, "num_prompts": 32 * 16}, | ||
{"input_len": 128, "output_len": 2048, "batch_size": 32, "num_prompts": 32 * 8}, | ||
# ttft batch-1 | ||
{"input_len": 128, "output_len": 128, "batch_size": 1, "num_prompts": 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this script batch_size
is practically unused, should we just erase it from combinations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that in newer vLLM upstream code, there is support for max_concurrent
(see https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py#L993) that implements what we have in benchmarking/prompt_client_online_benchmark.py
. I think we should keep support for it, but I should rename batch_size
-> max_concurrent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 0f1c87b
@@ -68,5 +68,5 @@ Running the `run_evals.sh` script will: | |||
|
|||
```bash | |||
cd ~/app/evals | |||
./run_evals | |||
. run_evals.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess unintentional change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found I needed to source the script to run it, otherwise the shell environment is different.
evals/run_evals.sh
Outdated
--tasks meta_ifeval \ | ||
--batch_size auto \ | ||
--output_path /home/user/cache_root/eval_output \ | ||
--include_path ./work_dir \ | ||
--seed 42 \ | ||
--log_samples | ||
|
||
cd $original_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing empty line at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in ecd2d4d
"max_tokens": max_tokens, | ||
"stream": stream, | ||
} | ||
completions_url = f"{self._get_api_base_url()}/chat/completions" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have both
self.completions_url
self.chat_completions_url
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both /v1/completions
and /v1/chat/completions
API endpoints are already supported, they are controlled by use_chat_api
bool.
…e that they only set maximum concurrent requests and not the actual model batch size
Change Log