`worker-config.json`'s QUANTIZATION does not has 'bitsandbytes' option #145

mohamednaji7 · 2025-01-21T11:41:44Z

Here is the error: itsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization

engine.py           :26   2025-01-21 11:18:49,619 Engine args: AsyncEngineArgs(model='unsloth/tinyllama-bnb-4bit', served_model_name=None, tokenizer='unsloth/tinyllama-bnb-4bit', task='auto', skip_tokenizer_init=False, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path='', download_dir=None, load_format='bitsandbytes', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='bfloat16', kv_cache_dtype='auto', quantization_param_path=None, seed=0, max_model_len=512, worker_use_ray=False, distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager='true', swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, revision=None, code_revision=None, rope_scaling=None, rope_theta=None, hf_overrides=None, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, fully_sharded_loras=False, lora_extra_vocab_size=256, long_lora_scaling_factors=None, lora_dtype='auto', max_cpu_loras=None, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config=None, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, guided_decoding_backend='outlines', logits_processor_pattern=None, speculative_model=None, speculative_model_quantization=None, speculative_draft_tensor_parallel_size=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, qlora_adapter_name_or_path=None, disable_logprobs_during_spec_decoding=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, worker_cls='auto', kv_transfer_config=None, generation_config=None, disable_log_requests=False)
engine.py           :115  2025-01-21 11:18:49,916 Error initializing vLLM engine: BitsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization, but got None
tokenizer_name_or_path: unsloth/tinyllama-bnb-4bit, tokenizer_revision: None, trust_remote_code: True
{"requestId": null, "message": "Uncaught exception | <class 'ValueError'>; BitsAndBytes load format and QLoRA adapter only support 'bitsandbytes' quantization, but got None; <traceback object at 0x70250d0b5300>;", "level": "ERROR"}

in the worker-config.json

  "QUANTIZATION": {
    "env_var_name": "QUANTIZATION",
    "value": "",
    "title": "Quantization",
    "description": "Method used to quantize the weights.",
    "required": false,
    "type": "select",
    "options": [
      { "value": "None", "label": "None" },
      { "value": "awq", "label": "AWQ" },
      { "value": "squeezellm", "label": "SqueezeLLM" },
      { "value": "gptq", "label": "GPTQ" }
    ]
  },

we need { "value": "bnb", "label": "bitsandbytes" }, o force it to be "bitsandbytes" since it is the only supported one by BitsAndBytes load format

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`worker-config.json`'s QUANTIZATION does not has 'bitsandbytes' option #145

`worker-config.json`'s QUANTIZATION does not has 'bitsandbytes' option #145

mohamednaji7 commented Jan 21, 2025 •

edited

Loading

worker-config.json's QUANTIZATION does not has 'bitsandbytes' option #145

worker-config.json's QUANTIZATION does not has 'bitsandbytes' option #145

Comments

mohamednaji7 commented Jan 21, 2025 • edited Loading

`worker-config.json`'s QUANTIZATION does not has 'bitsandbytes' option #145

`worker-config.json`'s QUANTIZATION does not has 'bitsandbytes' option #145

mohamednaji7 commented Jan 21, 2025 •

edited

Loading