Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding TRT-LLM config key #825

Merged
merged 8 commits into from
Feb 20, 2024
Merged

Adding TRT-LLM config key #825

merged 8 commits into from
Feb 20, 2024

Conversation

aspctu
Copy link
Collaborator

@aspctu aspctu commented Feb 13, 2024

🚀 What

This PR adds a new key to the config.yaml: trtllm. We use this for models that need engines built or to serve engines. The spec is:

trtllm:
  engine_repository: "<path_to_trt_engine_repository>"
  tokenizer_repository: "<path_to_tokenizer_repository>"
  tensor_parallel_count: int
  pipeline_parallel_count: int
  build_configuration: # Used to build engines
    huggingface_ckpt_repository: "<path_to_huggingface_checkpoint_repository>"
    base_model_architecture: "llama"  # Can be "llama" or "mistral"
    max_input_len: int
    max_output_len: int
    max_batch_size: int
    quantization_type: "no_quant"  # Can be "no_quant", "weights_only", "weights_kv_int8", "smooth_quant"
    gather_all_token_logits: bool
    multi_block_mode: bool
    calibrate_kv_cache: bool

This PR also updates pydantic to a newer version.

💻 How

🔬 Testing

Integration tests passed for pydantic change. Also tested the new key with various configs locally via manual context builds.

@aspctu aspctu marked this pull request as ready for review February 13, 2024 20:54
@aspctu aspctu requested review from joostinyi and bolasim February 13, 2024 20:54
Copy link
Collaborator

@bolasim bolasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall but would love an explanation of why we have to write the serialization manually. I thought we could rely on pydantic for that

truss/truss_config.py Outdated Show resolved Hide resolved
truss/truss_config.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@bolasim bolasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I think this PR is missing changes to bump the pydantic version on the server. I thikn that's probably okay assuming all the integration tests pass.

Copy link
Collaborator

@bolasim bolasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two suggestions that let you drop all the to/from dict functions. lemme know if it works.

truss/truss_config.py Outdated Show resolved Hide resolved
truss/truss_config.py Outdated Show resolved Hide resolved
@aspctu aspctu requested a review from bolasim February 13, 2024 21:41
truss/trt_llm.py Outdated Show resolved Hide resolved
@aspctu aspctu merged commit a05459c into main Feb 20, 2024
10 checks passed
@aspctu aspctu deleted the abuqader/adding-trtllm-truss-config branch February 20, 2024 17:04
@joostinyi
Copy link
Collaborator

@aspctu can you update the example from the description for the most recent interface?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants