DOC: fix docs (#2793)

xorbitsai · Feb 6, 2025 · 8d47380 · 8d47380
1 parent e6b5449
commit 8d47380
Show file tree

Hide file tree

Showing 5 changed files with 63 additions and 12 deletions.
diff --git a/doc/source/models/builtin/llm/qwen2-vl-instruct.rst b/doc/source/models/builtin/llm/qwen2-vl-instruct.rst
@@ -78,7 +78,23 @@ chosen quantization method from the options listed above::
    xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format awq --quantization ${quantization}
 
 
-Model Spec 5 (pytorch, 7 Billion)
+Model Spec 5 (mlx, 2 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** mlx
+- **Model Size (in billions):** 2
+- **Quantizations:** 4bit, 8bit
+- **Engines**: MLX
+- **Model ID:** mlx-community/Qwen2-VL-2B-Instruct-{quantization}
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/Qwen2-VL-2B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/mlx-community/Qwen2-VL-2B-Instruct-{quantization}>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 2 --model-format mlx --quantization ${quantization}
+
+
+Model Spec 6 (pytorch, 7 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** pytorch
@@ -94,7 +110,7 @@ chosen quantization method from the options listed above::
    xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format pytorch --quantization ${quantization}
 
 
-Model Spec 6 (gptq, 7 Billion)
+Model Spec 7 (gptq, 7 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** gptq
@@ -110,7 +126,7 @@ chosen quantization method from the options listed above::
    xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization}
 
 
-Model Spec 7 (gptq, 7 Billion)
+Model Spec 8 (gptq, 7 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** gptq
@@ -126,7 +142,7 @@ chosen quantization method from the options listed above::
    xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format gptq --quantization ${quantization}
 
 
-Model Spec 8 (awq, 7 Billion)
+Model Spec 9 (awq, 7 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** awq
@@ -142,7 +158,23 @@ chosen quantization method from the options listed above::
    xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format awq --quantization ${quantization}
 
 
-Model Spec 9 (pytorch, 72 Billion)
+Model Spec 10 (mlx, 7 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** mlx
+- **Model Size (in billions):** 7
+- **Quantizations:** 4bit, 8bit
+- **Engines**: MLX
+- **Model ID:** mlx-community/Qwen2-VL-7B-Instruct-{quantization}
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/Qwen2-VL-7B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/okwinds/Qwen2-VL-7B-Instruct-MLX-8bit>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 7 --model-format mlx --quantization ${quantization}
+
+
+Model Spec 11 (pytorch, 72 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** pytorch
@@ -158,7 +190,7 @@ chosen quantization method from the options listed above::
    xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format pytorch --quantization ${quantization}
 
 
-Model Spec 10 (awq, 72 Billion)
+Model Spec 12 (awq, 72 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** awq
@@ -174,7 +206,7 @@ chosen quantization method from the options listed above::
    xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format awq --quantization ${quantization}
 
 
-Model Spec 11 (gptq, 72 Billion)
+Model Spec 13 (gptq, 72 Billion)
 ++++++++++++++++++++++++++++++++++++++++
 
 - **Model Format:** gptq
@@ -189,3 +221,19 @@ chosen quantization method from the options listed above::
 
    xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format gptq --quantization ${quantization}
 
+
+Model Spec 14 (mlx, 72 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** mlx
+- **Model Size (in billions):** 72
+- **Quantizations:** 4bit, 8bit
+- **Engines**: MLX
+- **Model ID:** mlx-community/Qwen2-VL-72B-Instruct-{quantization}
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/Qwen2-VL-72B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/okwinds/Qwen2-VL-72B-Instruct-MLX-{quantization}>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name qwen2-vl-instruct --size-in-billions 72 --model-format mlx --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/qwen2.5-vl-instruct.rst b/doc/source/models/builtin/llm/qwen2.5-vl-instruct.rst
@@ -68,7 +68,7 @@ Model Spec 4 (mlx, 3 Billion)
 - **Model Format:** mlx
 - **Model Size (in billions):** 3
 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16
-- **Engines**: Transformers, MLX
+- **Engines**: MLX
 - **Model ID:** mlx-community/Qwen2.5-VL-3B-Instruct-{quantization}
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/Qwen2.5-VL-3B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/mlx-community/Qwen2.5-VL-3B-Instruct-{quantization}>`__
 
@@ -84,7 +84,7 @@ Model Spec 5 (mlx, 7 Billion)
 - **Model Format:** mlx
 - **Model Size (in billions):** 7
 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16
-- **Engines**: Transformers, MLX
+- **Engines**: MLX
 - **Model ID:** mlx-community/Qwen2.5-VL-7B-Instruct-{quantization}
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/Qwen2.5-VL-7B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/mlx-community/Qwen2.5-VL-7B-Instruct-{quantization}>`__
 
@@ -100,7 +100,7 @@ Model Spec 6 (mlx, 72 Billion)
 - **Model Format:** mlx
 - **Model Size (in billions):** 72
 - **Quantizations:** 3bit, 4bit, 6bit, 8bit, bf16
-- **Engines**: Transformers, MLX
+- **Engines**: MLX
 - **Model ID:** mlx-community/Qwen2.5-VL-72B-Instruct-{quantization}
 - **Model Hubs**:  `Hugging Face <https://huggingface.co/mlx-community/Qwen2.5-VL-72B-Instruct-{quantization}>`__, `ModelScope <https://modelscope.cn/models/mlx-community/Qwen2.5-VL-72B-Instruct-{quantization}>`__
 

diff --git a/setup.cfg b/setup.cfg
@@ -159,7 +159,7 @@ transformers =
     accelerate>=0.28.0
     sentencepiece
     transformers_stream_generator
-    bitsandbytes
+    bitsandbytes ; sys_platform=='linux'
     protobuf
     einops
     tiktoken

diff --git a/xinference/deploy/local.py b/xinference/deploy/local.py
@@ -41,7 +41,8 @@ async def _start_local_cluster(
 ):
     from .utils import create_worker_actor_pool
 
-    logging.config.dictConfig(logging_conf)  # type: ignore
+    if logging_conf:
+        logging.config.dictConfig(logging_conf)  # type: ignore
 
     pool = None
     try:

diff --git a/xinference/model/llm/transformers/qwen2_vl.py b/xinference/model/llm/transformers/qwen2_vl.py
@@ -45,6 +45,8 @@ def __init__(self, *args, **kwargs):
     def match(
         cls, model_family: "LLMFamilyV1", model_spec: "LLMSpecV1", quantization: str
     ) -> bool:
+        if model_spec.model_format not in ["pytorch", "gptq", "awq"]:
+            return False
         llm_family = model_family.model_family or model_family.model_name
         if "qwen2-vl-instruct".lower() in llm_family.lower():
             return True