New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

RHOAIENG-12250 - Rework model serving introduction #621

Open

eturner24 wants to merge 8 commits into opendatahub-io:main from eturner24:RHOAIENG-12250-model-serving-intro

+56 −12

Contributor

eturner24 commented Jan 30, 2025 •

edited

Loading

Description

Add more details about the model serving platform and how to choose one of them. Modified the TOC to present the preferred model serving platform and renamed the platforms to be consistent with our terminology.

TOC edits:

How Has This Been Tested?

Created a local build and confirmed the changes appeared as expected

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

eturner24 added 7 commits

January 30, 2025 08:48


          Rework intro

75745cc


          Update about-model-serving.adoc

3e73da5


          Update about-model-serving.adoc

b4379a1


          Update about-model-serving.adoc

07e4487


          Change large and small model references to model serving platform

8222cac


          Present large models first

bd6069f


          Update serving-models.adoc

1fc746f

syaseen-rh reviewed

View reviewed changes

modules/about-model-serving.adoc Outdated

    
              {productname-short} provides the following model serving platforms:

              First, you upload the model to an S3-compatible storage container, persistent volume claim, or Open Container Initiative (OCI) image. Then, you serve trained models on your {openshift-platform} cluster. Serving or deploying models makes the model available as a service, or model runtime server, that you can access using an API.

Contributor

syaseen-rh Jan 30, 2025 •

edited

Loading

Suggestions:

For consistency - replace S3-compatible storage container with S3-compatible object storage

syaseen-rh reviewed

View reviewed changes

modules/about-model-serving.adoc Outdated

+              * Single-model serving platform
+              * Multi-model serving platform
+              * NVIDIA NIM-model serving platform

Contributor

syaseen-rh Jan 30, 2025

NVIDA NIM model serving platform (no hyphen)

syaseen-rh reviewed

View reviewed changes

modules/about-model-serving.adoc Outdated

+              * If you want to deploy each model on its own runtime server, or want to use a serverless deployment, select the *single-model serving platform*. The single-model serving platform is recommended for production use.
+              * If you want to deploy multiple models with only one runtime server, select the *multi-model serving platform*. This option is best if you are deploying more than 1,000 small and medium models and want to reduce resource consumption.
+              * If you are using the NVIDIA serving runtime and NVIDIA NIMs, select the *NVIDIA NIM-model serving platform*.

Contributor

syaseen-rh Jan 30, 2025

If you want to use NVIDIA Inference Microservices (NIMs) to deploy a model, select the NVIDIA NIM model serving platform

syaseen-rh reviewed

View reviewed changes

modules/about-model-serving.adoc Outdated


		== NVIDIA NIM model serving platform

		You can deploy models using NVIDIA NIM inference services on the NVIDIA NIM model serving platform.

Contributor

syaseen-rh Jan 30, 2025

You can deploy models using NVIDIA Inference Microservices (NIM) ..

syaseen-rh reviewed

View reviewed changes

modules/about-model-serving.adoc Outdated

-              Single-model serving platform::
-              For deploying large models such as large language models (LLMs), {productname-short} includes a _single-model serving platform_ that is based on the link:https://github.com/kserve/kserve[KServe^] component. Because each model is deployed from its own model server, the single-model serving platform helps you to deploy, monitor, scale, and maintain large models that require increased resources.
+              After you serve a model, you can access inference endpoints for the deployed model from the dashboard. You can see predictions based on data inputs that you provide through API calls. Querying the model through the API is also called model inferencing.

Contributor

syaseen-rh Jan 30, 2025 •

edited

Loading

Slight rewrite for lines 9-11, putting it altogether:
You can upload a model to an S3-compatible object storage, persistent volume claim, or Open Container Initiative (OCI) image. You can then access and train the model from your project workbench. After training the model, you can serve or deploy the model using a model-serving platform.
Serving or deploying the model makes the model available as a service, or model runtime server, that you can access using an API. You can then access the inference endpoints for the deployed model from the dashboard and see predictions based on data inputs that you provide through API calls. Querying the model through the API is also called model inferencing.


          Peer review

5ab6183

syaseen-rh approved these changes

View reviewed changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet