Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.x multimodal support for evaluations and benchmarking #79

Merged
merged 3 commits into from
Jan 28, 2025

Conversation

tstescoTT
Copy link
Contributor

Change Log

## Change Log
- add Llama 3.2 Vision image input support in utils prompt generation and benchmarking script
- add MMMU with supprt from https://github.com/tstescoTT/lm-evaluation-harness/tree/tstesco/add-local-multimodal
- address #73 in run_vllm_api_server.py::ensure_mesh_device MESH_DEVICE handling
- fix #62 with run_vllm_api_server.py::register_vllm_models
{"input_len": 128, "output_len": 1024, "batch_size": 32, "num_prompts": 32 * 16},
{"input_len": 128, "output_len": 2048, "batch_size": 32, "num_prompts": 32 * 8},
# ttft batch-1
{"input_len": 128, "output_len": 128, "batch_size": 1, "num_prompts": 1},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this script batch_size is practically unused, should we just erase it from combinations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that in newer vLLM upstream code, there is support for max_concurrent (see https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_serving.py#L993) that implements what we have in benchmarking/prompt_client_online_benchmark.py. I think we should keep support for it, but I should rename batch_size -> max_concurrent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 0f1c87b

@@ -68,5 +68,5 @@ Running the `run_evals.sh` script will:

```bash
cd ~/app/evals
./run_evals
. run_evals.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess unintentional change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found I needed to source the script to run it, otherwise the shell environment is different.

--tasks meta_ifeval \
--batch_size auto \
--output_path /home/user/cache_root/eval_output \
--include_path ./work_dir \
--seed 42 \
--log_samples

cd $original_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing empty line at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in ecd2d4d

"max_tokens": max_tokens,
"stream": stream,
}
completions_url = f"{self._get_api_base_url()}/chat/completions"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have both

self.completions_url
self.chat_completions_url

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both /v1/completions and /v1/chat/completions API endpoints are already supported, they are controlled by use_chat_api bool.

…e that they only set maximum concurrent requests and not the actual model batch size
@tstescoTT tstescoTT merged commit a31ac60 into main Jan 28, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vLLM+LLama3.1-70B docker container built from scratch caused an exception in load_model
2 participants