[Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4 #3243

shahizat · 2025-01-31T20:01:43Z

Checklist

1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
2. Please use English, otherwise it will be closed.

Motivation

Hello,

If anyone is interested, here's how I run SGlang on the AMD RX 7900 XTX (gfx1100) with ROCm 6.2.4. Currently, the attention backend is based on Triton. It seems that flashInfer support is under development. Hope it helps.

Create a Dockerfile, which is based on the vLLM ROCm dockerfile:

# default base image
ARG REMOTE_VLLM="0"
ARG USE_CYTHON="0"
ARG BUILD_RPD="1"
ARG COMMON_WORKDIR=/app
ARG BASE_IMAGE=rocm/vllm-dev:base

FROM ${BASE_IMAGE} AS base

ARG ARG_PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH=${ARG_PYTORCH_ROCM_ARCH:-${PYTORCH_ROCM_ARCH}}

# Install some basic utilities
RUN apt-get update -q -y && apt-get install -q -y \
    sqlite3 libsqlite3-dev libfmt-dev libmsgpack-dev libsuitesparse-dev
# Remove sccache
RUN python3 -m pip install --upgrade pip && pip install setuptools_scm
RUN apt-get purge -y sccache; python3 -m pip uninstall -y sccache; rm -f "$(which sccache)"
ARG COMMON_WORKDIR
WORKDIR ${COMMON_WORKDIR}

# -----------------------
# vLLM fetch stages
FROM base AS fetch_vllm_0
ONBUILD COPY ./ vllm/
FROM base AS fetch_vllm_1
ARG VLLM_REPO="https://github.com/vllm-project/vllm.git"
ARG VLLM_BRANCH="main"
ONBUILD RUN git clone ${VLLM_REPO} \
        && cd vllm \
        && git checkout ${VLLM_BRANCH}
FROM fetch_vllm_${REMOTE_VLLM} AS fetch_vllm

# -----------------------
# vLLM build stages
FROM fetch_vllm AS build_vllm
ARG USE_CYTHON
# Build vLLM
RUN cd vllm \
    && python3 -m pip install -r requirements-rocm.txt \
    && python3 setup.py clean --all  \
    && if [ ${USE_CYTHON} -eq "1" ]; then python3 setup_cython.py build_ext --inplace; fi \
    && python3 setup.py bdist_wheel --dist-dir=dist
FROM scratch AS export_vllm
ARG COMMON_WORKDIR
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/dist/*.whl /
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/requirements*.txt /
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/benchmarks /benchmarks
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/tests /tests
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/examples /examples
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/.buildkite /.buildkite

# -----------------------
# Test vLLM image
FROM base AS test

RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*

# Install vLLM
RUN --mount=type=bind,from=export_vllm,src=/,target=/install \
    cd /install \
    && pip install -U -r requirements-rocm.txt \
    && pip uninstall -y vllm \
    && pip install *.whl

WORKDIR /vllm-workspace
ARG COMMON_WORKDIR
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm /vllm-workspace

# install development dependencies (for testing)
RUN cd /vllm-workspace \
    && rm -rf vllm \
    && python3 -m pip install -e tests/vllm_test_utils \
    && python3 -m pip install lm-eval[api]==0.4.4 \
    && python3 -m pip install pytest-shard

# -----------------------
# Final vLLM image
FROM base AS final

RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*
# Error related to odd state for numpy 1.20.3 where there is no METADATA etc, but an extra LICENSES_bundled.txt.
# Manually remove it so that later steps of numpy upgrade can continue
RUN case "$(which python3)" in \
        *"/opt/conda/envs/py_3.9"*) \
            rm -rf /opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy-1.20.3.dist-info/;; \
        *) ;; esac

RUN python3 -m pip install --upgrade huggingface-hub[cli]
ARG BUILD_RPD
RUN if [ ${BUILD_RPD} -eq "1" ]; then \
    git clone -b nvtx_enabled https://github.com/ROCm/rocmProfileData.git \
    && cd rocmProfileData/rpd_tracer \
    && pip install -r requirements.txt && cd ../ \
    && make && make install \
    && cd hipMarker && python3 setup.py install ; fi

# Install vLLM
RUN --mount=type=bind,from=export_vllm,src=/,target=/install \
    cd /install \
    && pip install -U -r requirements-rocm.txt \
    && pip uninstall -y vllm \
    && pip install *.whl

ARG COMMON_WORKDIR

# Copy over the benchmark scripts as well
COPY --from=export_vllm /benchmarks ${COMMON_WORKDIR}/vllm/benchmarks
COPY --from=export_vllm /examples ${COMMON_WORKDIR}/vllm/examples

# Install SGLang
RUN git clone https://github.com/sgl-project/sglang.git /app/sglang \
    &&  sed -i '/vllm==0.6.4.post1/d; /flashinfer==0.1.6/d' /app/sglang/python/pyproject.toml \
    && cd /app/sglang \
    && python3 -m pip --no-cache-dir install -e "python[all]"

ENV RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1
ENV TOKENIZERS_PARALLELISM=false

# Performance environment variable.
ENV HIP_FORCE_DEV_KERNARG=1

CMD ["/bin/bash"]

Build using:

DOCKER_BUILDKIT=1 docker build --build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" -f Dockerfile.rocm_new -t sglang-rocm .

Run the container:

docker run -it \
    --network=host \
    --group-add=video \
    --ipc=host \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v ./models/:/root/.cache/huggingface \
    sglang-rocm:latest \
    bash

Run inside of container:

python -m sglang.launch_server \
    --model-path meta-llama/Llama-3.1-8B-Instruct \
    --attention-backend triton

Send a request using the Python code below:

import openai

client = openai.Client(base_url="http://127.0.0.1:30000/v1", api_key="None")

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Introduce yourself"},
    ],
    temperature=0,
    max_tokens=500,
    stream=True  # Enable streaming
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Related resources

No response

The text was updated successfully, but these errors were encountered:

zhaochenyang20 · 2025-02-01T00:18:54Z

Thanks so much! I will send to AMD team and will add docs in docs.sglang.ai ASAP.

shahizat · 2025-02-01T10:27:20Z

hello @zhaochenyang20, thanks, I uploaded to the DockerHUB: https://hub.docker.com/repository/docker/shahizat005/sglang-rocm/tags

Below are the metrics:

[2025-02-01 10:12:48 TP0] Decode batch. #running-req: 1, #token: 51, token usage: 0.00, gen throughput (token/s): 3.50, #queue-req: 0
[2025-02-01 10:12:49 TP0] Decode batch. #running-req: 1, #token: 91, token usage: 0.00, gen throughput (token/s): 28.11, #queue-req: 0
[2025-02-01 10:12:51 TP0] Decode batch. #running-req: 1, #token: 131, token usage: 0.00, gen throughput (token/s): 28.12, #queue-req: 0
[2025-02-01 10:12:52 TP0] Decode batch. #running-req: 1, #token: 171, token usage: 0.00, gen throughput (token/s): 28.13, #queue-req: 0
[2025-02-01 10:12:53 TP0] Decode batch. #running-req: 1, #token: 211, token usage: 0.00, gen throughput (token/s): 28.12, #queue-req: 0

zhaochenyang20 · 2025-02-01T17:57:59Z

Really nice! Thanks!

zhaochenyang20 self-assigned this Feb 1, 2025

zhaochenyang20 added documentation Improvements or additions to documentation amd labels Feb 1, 2025

zhaochenyang20 mentioned this issue Feb 1, 2025

[Docs] Add docs for running SGLang on AMD #3245

Open

2 tasks

shahizat changed the title ~~[Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCv 6.2.4~~ [Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4 Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4 #3243

[Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4 #3243

shahizat commented Jan 31, 2025

zhaochenyang20 commented Feb 1, 2025

shahizat commented Feb 1, 2025 •

edited

Loading

zhaochenyang20 commented Feb 1, 2025

[Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4 #3243

[Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4 #3243

Comments

shahizat commented Jan 31, 2025

Checklist

Motivation

Related resources

zhaochenyang20 commented Feb 1, 2025

shahizat commented Feb 1, 2025 • edited Loading

zhaochenyang20 commented Feb 1, 2025

shahizat commented Feb 1, 2025 •

edited

Loading