Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4 #3243

Open
1 of 2 tasks
shahizat opened this issue Jan 31, 2025 · 3 comments
Open
1 of 2 tasks
Assignees
Labels
amd documentation Improvements or additions to documentation

Comments

@shahizat
Copy link

Checklist

Motivation

Hello,

If anyone is interested, here's how I run SGlang on the AMD RX 7900 XTX (gfx1100) with ROCm 6.2.4. Currently, the attention backend is based on Triton. It seems that flashInfer support is under development. Hope it helps.

Create a Dockerfile, which is based on the vLLM ROCm dockerfile:

# default base image
ARG REMOTE_VLLM="0"
ARG USE_CYTHON="0"
ARG BUILD_RPD="1"
ARG COMMON_WORKDIR=/app
ARG BASE_IMAGE=rocm/vllm-dev:base

FROM ${BASE_IMAGE} AS base

ARG ARG_PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH=${ARG_PYTORCH_ROCM_ARCH:-${PYTORCH_ROCM_ARCH}}

# Install some basic utilities
RUN apt-get update -q -y && apt-get install -q -y \
    sqlite3 libsqlite3-dev libfmt-dev libmsgpack-dev libsuitesparse-dev
# Remove sccache
RUN python3 -m pip install --upgrade pip && pip install setuptools_scm
RUN apt-get purge -y sccache; python3 -m pip uninstall -y sccache; rm -f "$(which sccache)"
ARG COMMON_WORKDIR
WORKDIR ${COMMON_WORKDIR}

# -----------------------
# vLLM fetch stages
FROM base AS fetch_vllm_0
ONBUILD COPY ./ vllm/
FROM base AS fetch_vllm_1
ARG VLLM_REPO="https://github.com/vllm-project/vllm.git"
ARG VLLM_BRANCH="main"
ONBUILD RUN git clone ${VLLM_REPO} \
        && cd vllm \
        && git checkout ${VLLM_BRANCH}
FROM fetch_vllm_${REMOTE_VLLM} AS fetch_vllm

# -----------------------
# vLLM build stages
FROM fetch_vllm AS build_vllm
ARG USE_CYTHON
# Build vLLM
RUN cd vllm \
    && python3 -m pip install -r requirements-rocm.txt \
    && python3 setup.py clean --all  \
    && if [ ${USE_CYTHON} -eq "1" ]; then python3 setup_cython.py build_ext --inplace; fi \
    && python3 setup.py bdist_wheel --dist-dir=dist
FROM scratch AS export_vllm
ARG COMMON_WORKDIR
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/dist/*.whl /
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/requirements*.txt /
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/benchmarks /benchmarks
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/tests /tests
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/examples /examples
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm/.buildkite /.buildkite

# -----------------------
# Test vLLM image
FROM base AS test

RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*

# Install vLLM
RUN --mount=type=bind,from=export_vllm,src=/,target=/install \
    cd /install \
    && pip install -U -r requirements-rocm.txt \
    && pip uninstall -y vllm \
    && pip install *.whl

WORKDIR /vllm-workspace
ARG COMMON_WORKDIR
COPY --from=build_vllm ${COMMON_WORKDIR}/vllm /vllm-workspace

# install development dependencies (for testing)
RUN cd /vllm-workspace \
    && rm -rf vllm \
    && python3 -m pip install -e tests/vllm_test_utils \
    && python3 -m pip install lm-eval[api]==0.4.4 \
    && python3 -m pip install pytest-shard

# -----------------------
# Final vLLM image
FROM base AS final

RUN python3 -m pip install --upgrade pip && rm -rf /var/lib/apt/lists/*
# Error related to odd state for numpy 1.20.3 where there is no METADATA etc, but an extra LICENSES_bundled.txt.
# Manually remove it so that later steps of numpy upgrade can continue
RUN case "$(which python3)" in \
        *"/opt/conda/envs/py_3.9"*) \
            rm -rf /opt/conda/envs/py_3.9/lib/python3.9/site-packages/numpy-1.20.3.dist-info/;; \
        *) ;; esac

RUN python3 -m pip install --upgrade huggingface-hub[cli]
ARG BUILD_RPD
RUN if [ ${BUILD_RPD} -eq "1" ]; then \
    git clone -b nvtx_enabled https://github.com/ROCm/rocmProfileData.git \
    && cd rocmProfileData/rpd_tracer \
    && pip install -r requirements.txt && cd ../ \
    && make && make install \
    && cd hipMarker && python3 setup.py install ; fi

# Install vLLM
RUN --mount=type=bind,from=export_vllm,src=/,target=/install \
    cd /install \
    && pip install -U -r requirements-rocm.txt \
    && pip uninstall -y vllm \
    && pip install *.whl

ARG COMMON_WORKDIR

# Copy over the benchmark scripts as well
COPY --from=export_vllm /benchmarks ${COMMON_WORKDIR}/vllm/benchmarks
COPY --from=export_vllm /examples ${COMMON_WORKDIR}/vllm/examples

# Install SGLang
RUN git clone https://github.com/sgl-project/sglang.git /app/sglang \
    &&  sed -i '/vllm==0.6.4.post1/d; /flashinfer==0.1.6/d' /app/sglang/python/pyproject.toml \
    && cd /app/sglang \
    && python3 -m pip --no-cache-dir install -e "python[all]"

ENV RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1
ENV TOKENIZERS_PARALLELISM=false

# Performance environment variable.
ENV HIP_FORCE_DEV_KERNARG=1

CMD ["/bin/bash"]

Build using:

DOCKER_BUILDKIT=1 docker build --build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" -f Dockerfile.rocm_new -t sglang-rocm .

Run the container:

docker run -it \
    --network=host \
    --group-add=video \
    --ipc=host \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    -v ./models/:/root/.cache/huggingface \
    sglang-rocm:latest \
    bash

Run inside of container:

python -m sglang.launch_server \
    --model-path meta-llama/Llama-3.1-8B-Instruct \
    --attention-backend triton

Send a request using the Python code below:

import openai

client = openai.Client(base_url="http://127.0.0.1:30000/v1", api_key="None")

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Introduce yourself"},
    ],
    temperature=0,
    max_tokens=500,
    stream=True  # Enable streaming
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Related resources

No response

@zhaochenyang20 zhaochenyang20 self-assigned this Feb 1, 2025
@zhaochenyang20 zhaochenyang20 added documentation Improvements or additions to documentation amd labels Feb 1, 2025
@zhaochenyang20
Copy link
Collaborator

Thanks so much! I will send to AMD team and will add docs in docs.sglang.ai ASAP.

@shahizat shahizat changed the title [Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCv 6.2.4 [Feature] Instructions for running Sglang on AMD RX 7900 XTX (gfx1100) ROCm 6.2.4 Feb 1, 2025
@shahizat
Copy link
Author

shahizat commented Feb 1, 2025

hello @zhaochenyang20, thanks, I uploaded to the DockerHUB: https://hub.docker.com/repository/docker/shahizat005/sglang-rocm/tags

Below are the metrics:

[2025-02-01 10:12:48 TP0] Decode batch. #running-req: 1, #token: 51, token usage: 0.00, gen throughput (token/s): 3.50, #queue-req: 0
[2025-02-01 10:12:49 TP0] Decode batch. #running-req: 1, #token: 91, token usage: 0.00, gen throughput (token/s): 28.11, #queue-req: 0
[2025-02-01 10:12:51 TP0] Decode batch. #running-req: 1, #token: 131, token usage: 0.00, gen throughput (token/s): 28.12, #queue-req: 0
[2025-02-01 10:12:52 TP0] Decode batch. #running-req: 1, #token: 171, token usage: 0.00, gen throughput (token/s): 28.13, #queue-req: 0
[2025-02-01 10:12:53 TP0] Decode batch. #running-req: 1, #token: 211, token usage: 0.00, gen throughput (token/s): 28.12, #queue-req: 0

@zhaochenyang20
Copy link
Collaborator

Really nice! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
amd documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants