-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RHOAIENG-12250 - Rework model serving introduction #621
base: main
Are you sure you want to change the base?
RHOAIENG-12250 - Rework model serving introduction #621
Conversation
modules/about-model-serving.adoc
Outdated
|
||
{productname-short} provides the following model serving platforms: | ||
First, you upload the model to an S3-compatible storage container, persistent volume claim, or Open Container Initiative (OCI) image. Then, you serve trained models on your {openshift-platform} cluster. Serving or deploying models makes the model available as a service, or model runtime server, that you can access using an API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions:
- For consistency - replace S3-compatible storage container with S3-compatible object storage
modules/about-model-serving.adoc
Outdated
|
||
* Single-model serving platform | ||
* Multi-model serving platform | ||
* NVIDIA NIM-model serving platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVIDA NIM model serving platform (no hyphen)
modules/about-model-serving.adoc
Outdated
|
||
* If you want to deploy each model on its own runtime server, or want to use a serverless deployment, select the *single-model serving platform*. The single-model serving platform is recommended for production use. | ||
* If you want to deploy multiple models with only one runtime server, select the *multi-model serving platform*. This option is best if you are deploying more than 1,000 small and medium models and want to reduce resource consumption. | ||
* If you are using the NVIDIA serving runtime and NVIDIA NIMs, select the *NVIDIA NIM-model serving platform*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to use NVIDIA Inference Microservices (NIMs) to deploy a model, select the NVIDIA NIM model serving platform
modules/about-model-serving.adoc
Outdated
|
||
== NVIDIA NIM model serving platform | ||
|
||
You can deploy models using NVIDIA NIM inference services on the NVIDIA NIM model serving platform. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can deploy models using NVIDIA Inference Microservices (NIM) ..
modules/about-model-serving.adoc
Outdated
|
||
Single-model serving platform:: | ||
For deploying large models such as large language models (LLMs), {productname-short} includes a _single-model serving platform_ that is based on the link:https://github.com/kserve/kserve[KServe^] component. Because each model is deployed from its own model server, the single-model serving platform helps you to deploy, monitor, scale, and maintain large models that require increased resources. | ||
After you serve a model, you can access inference endpoints for the deployed model from the dashboard. You can see predictions based on data inputs that you provide through API calls. Querying the model through the API is also called model inferencing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight rewrite for lines 9-11, putting it altogether:
You can upload a model to an S3-compatible object storage, persistent volume claim, or Open Container Initiative (OCI) image. You can then access and train the model from your project workbench. After training the model, you can serve or deploy the model using a model-serving platform.
Serving or deploying the model makes the model available as a service, or model runtime server, that you can access using an API. You can then access the inference endpoints for the deployed model from the dashboard and see predictions based on data inputs that you provide through API calls. Querying the model through the API is also called model inferencing.
Description
Add more details about the model serving platform and how to choose one of them. Modified the TOC to present the preferred model serving platform and renamed the platforms to be consistent with our terminology.
TOC edits:
How Has This Been Tested?
Created a local build and confirmed the changes appeared as expected
Merge criteria: