Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-12250 - Rework model serving introduction #621

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

eturner24
Copy link
Contributor

@eturner24 eturner24 commented Jan 30, 2025

Description

Add more details about the model serving platform and how to choose one of them. Modified the TOC to present the preferred model serving platform and renamed the platforms to be consistent with our terminology.

Screenshot 2025-01-30 at 09 58 40

TOC edits:

Screenshot 2025-01-30 at 10 46 59

How Has This Been Tested?

Created a local build and confirmed the changes appeared as expected

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work


{productname-short} provides the following model serving platforms:
First, you upload the model to an S3-compatible storage container, persistent volume claim, or Open Container Initiative (OCI) image. Then, you serve trained models on your {openshift-platform} cluster. Serving or deploying models makes the model available as a service, or model runtime server, that you can access using an API.
Copy link
Contributor

@syaseen-rh syaseen-rh Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions:

  • For consistency - replace S3-compatible storage container with S3-compatible object storage


* Single-model serving platform
* Multi-model serving platform
* NVIDIA NIM-model serving platform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NVIDA NIM model serving platform (no hyphen)


* If you want to deploy each model on its own runtime server, or want to use a serverless deployment, select the *single-model serving platform*. The single-model serving platform is recommended for production use.
* If you want to deploy multiple models with only one runtime server, select the *multi-model serving platform*. This option is best if you are deploying more than 1,000 small and medium models and want to reduce resource consumption.
* If you are using the NVIDIA serving runtime and NVIDIA NIMs, select the *NVIDIA NIM-model serving platform*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use NVIDIA Inference Microservices (NIMs) to deploy a model, select the NVIDIA NIM model serving platform


== NVIDIA NIM model serving platform

You can deploy models using NVIDIA NIM inference services on the NVIDIA NIM model serving platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can deploy models using NVIDIA Inference Microservices (NIM) ..


Single-model serving platform::
For deploying large models such as large language models (LLMs), {productname-short} includes a _single-model serving platform_ that is based on the link:https://github.com/kserve/kserve[KServe^] component. Because each model is deployed from its own model server, the single-model serving platform helps you to deploy, monitor, scale, and maintain large models that require increased resources.
After you serve a model, you can access inference endpoints for the deployed model from the dashboard. You can see predictions based on data inputs that you provide through API calls. Querying the model through the API is also called model inferencing.
Copy link
Contributor

@syaseen-rh syaseen-rh Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight rewrite for lines 9-11, putting it altogether:
You can upload a model to an S3-compatible object storage, persistent volume claim, or Open Container Initiative (OCI) image. You can then access and train the model from your project workbench. After training the model, you can serve or deploy the model using a model-serving platform.
Serving or deploying the model makes the model available as a service, or model runtime server, that you can access using an API. You can then access the inference endpoints for the deployed model from the dashboard and see predictions based on data inputs that you provide through API calls. Querying the model through the API is also called model inferencing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants