Llama 3.x multimodal support for evaluations and benchmarking #79

tstescoTT · 2025-01-27T21:43:39Z

Change Log

add Llama 3.2 Vision image input support in utils prompt generation and benchmarking script
add MMMU with supprt from https://github.com/tstescoTT/lm-evaluation-harness/tree/tstesco/add-local-multimodal
address MESH_DEVICE management for Llama 3.x implementations #73 in run_vllm_api_server.py::ensure_mesh_device MESH_DEVICE handling
fix vLLM+LLama3.1-70B docker container built from scratch caused an exception in load_model #62 with run_vllm_api_server.py::register_vllm_models

## Change Log - add Llama 3.2 Vision image input support in utils prompt generation and benchmarking script - add MMMU with supprt from https://github.com/tstescoTT/lm-evaluation-harness/tree/tstesco/add-local-multimodal - address #73 in run_vllm_api_server.py::ensure_mesh_device MESH_DEVICE handling - fix #62 with run_vllm_api_server.py::register_vllm_models

rpavlovicTT · 2025-01-28T13:28:26Z

benchmarking/vllm_online_benchmark.py

-        {"input_len": 128, "output_len": 1024, "batch_size": 32, "num_prompts": 32 * 16},
-        {"input_len": 128, "output_len": 2048, "batch_size": 32, "num_prompts": 32 * 8},
+        # ttft batch-1
+        {"input_len": 128, "output_len": 128, "batch_size": 1, "num_prompts": 1},


In this script batch_size is practically unused, should we just erase it from combinations?

It turns out that in newer vLLM upstream code, there is support for max_concurrent (see https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py#L993) that implements what we have in benchmarking/prompt_client_online_benchmark.py. I think we should keep support for it, but I should rename batch_size -> max_concurrent.

added in 0f1c87b

rpavlovicTT · 2025-01-28T13:28:59Z

evals/README.md

@@ -68,5 +68,5 @@ Running the `run_evals.sh` script will:

 ```bash
 cd ~/app/evals
-./run_evals
+. run_evals.sh


I guess unintentional change?

I found I needed to source the script to run it, otherwise the shell environment is different.

rpavlovicTT · 2025-01-28T13:30:13Z

evals/run_evals.sh

 --tasks meta_ifeval \
 --batch_size auto \
 --output_path /home/user/cache_root/eval_output \
 --include_path ./work_dir \
 --seed 42  \
 --log_samples
+
+cd $original_dir


Missing empty line at the end

added in ecd2d4d

rpavlovicTT · 2025-01-28T13:33:43Z

utils/prompt_client.py

+                "max_tokens": max_tokens,
+                "stream": stream,
+            }
+            completions_url = f"{self._get_api_base_url()}/chat/completions"


Would be nice to have both

self.completions_url self.chat_completions_url

both /v1/completions and /v1/chat/completions API endpoints are already supported, they are controlled by use_chat_api bool.

…e that they only set maximum concurrent requests and not the actual model batch size

tstescoTT requested review from milank94 and rpavlovicTT January 27, 2025 21:43

rpavlovicTT approved these changes Jan 28, 2025

View reviewed changes

tstescoTT added 2 commits January 28, 2025 19:47

rename batch_size -> max_concurrent in client side scripts to indicat…

0f1c87b

…e that they only set maximum concurrent requests and not the actual model batch size

adding missing empty line at end of shell scripts

ecd2d4d

tstescoTT merged commit a31ac60 into main Jan 28, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.x multimodal support for evaluations and benchmarking #79

Llama 3.x multimodal support for evaluations and benchmarking #79

tstescoTT commented Jan 27, 2025

rpavlovicTT Jan 28, 2025

tstescoTT Jan 28, 2025

tstescoTT Jan 28, 2025

rpavlovicTT Jan 28, 2025

tstescoTT Jan 28, 2025

rpavlovicTT Jan 28, 2025

tstescoTT Jan 28, 2025

rpavlovicTT Jan 28, 2025

tstescoTT Jan 28, 2025

Llama 3.x multimodal support for evaluations and benchmarking #79

Llama 3.x multimodal support for evaluations and benchmarking #79

Conversation

tstescoTT commented Jan 27, 2025

Change Log

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment