feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation #1213

mattf · 2025-02-21T21:45:43Z

What does this PR do?

updates nvidia inference provider's embedding implementation to use new signature

add support for task_type, output_dimensions, text_truncation parameters

Test Plan

LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_embedding.py --embedding-model baai/bge-m3

…put_dimention, text_truncation

mattf · 2025-02-21T21:46:31Z

this is blocked by meta-llama/llama-stack-client-python#162

mattf · 2025-02-21T21:46:48Z

cc @raspawar @cdgamarose-nv

raspawar · 2025-02-24T11:13:19Z

lgtm, thanks @mattf for looking into this. I see the #162 is closed and this should be ok to merge.

ehhuang · 2025-02-25T00:29:42Z

llama_stack/providers/remote/inference/nvidia/nvidia.py

+        if text_truncation is not None:
+            text_truncation_options = {
+                TextTruncation.none: "NONE",
+                TextTruncation.end: "END",
+                TextTruncation.start: "START",
+            }
+            if text_truncation not in text_truncation_options:
+                raise ValueError(f"Invalid text_truncation: {text_truncation}")


We can move this validation here [1], so it applies to all providers.

[1]

llama-stack/llama_stack/distribution/routers/routers.py

Line 223 in 9b0f783

async def embeddings(

turns out fastapi will type validate before invoking the routers. i can remove this defensive code.

ehhuang · 2025-02-25T00:30:07Z

llama_stack/providers/remote/inference/nvidia/nvidia.py

+        if task_type is not None:
+            task_type_options = {
+                EmbeddingTaskType.document: "passage",
+                EmbeddingTaskType.query: "query",
+            }
+            if task_type not in task_type_options:
+                raise ValueError(f"Invalid task_type: {task_type}")


also removing, will rely on fastapi request validation

ehhuang · 2025-02-25T00:35:31Z

tests/client-sdk/inference/test_embedding.py

+        )
+
+
+@pytest.mark.xfail(reason="Only valid for model supporting dimension reduction")


is there a model we can use for testing this?

edit: ~~you can use baai/bge-m3. i've also updated the test instructions.~~

oops. yes, nvidia/llama-3.2-nv-embedqa-1b-v2, see https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/support-matrix.html

…alidation

add nvidia embedding implementation for new signature, task_type, out…

610dea8

…put_dimention, text_truncation

mattf requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang and terrytangyuan as code owners February 21, 2025 21:45

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 21, 2025

mattf added 2 commits February 21, 2025 15:47

fix input_type values

e4037d0

handle error and update tests for new client

b54d896

mattf changed the title ~~add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation~~ feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation Feb 21, 2025

fix typo: doument -> document

301a068

ehhuang reviewed Feb 25, 2025

View reviewed changes

mattf added 2 commits February 25, 2025 06:52

Merge branch 'main' into update-nvidia-embedding

9b7b0c4

remove redundant defensive checks, fastapi does appropriate request v…

d4c3aee

…alidation

mattf requested a review from ehhuang February 25, 2025 12:18

skip -> xfail for image tests

8eefbfe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation #1213

feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation #1213

mattf commented Feb 21, 2025 •

edited

Loading

mattf commented Feb 21, 2025

mattf commented Feb 21, 2025

raspawar commented Feb 24, 2025

ehhuang Feb 25, 2025

mattf Feb 25, 2025

ehhuang Feb 25, 2025

mattf Feb 25, 2025

ehhuang Feb 25, 2025

mattf Feb 25, 2025 •

edited

Loading

		)


		@pytest.mark.xfail(reason="Only valid for model supporting dimension reduction")

feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation #1213

Are you sure you want to change the base?

feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation #1213

Conversation

mattf commented Feb 21, 2025 • edited Loading

What does this PR do?

Test Plan

mattf commented Feb 21, 2025

mattf commented Feb 21, 2025

raspawar commented Feb 24, 2025

ehhuang Feb 25, 2025

Choose a reason for hiding this comment

mattf Feb 25, 2025

Choose a reason for hiding this comment

ehhuang Feb 25, 2025

Choose a reason for hiding this comment

mattf Feb 25, 2025

Choose a reason for hiding this comment

ehhuang Feb 25, 2025

Choose a reason for hiding this comment

mattf Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

mattf commented Feb 21, 2025 •

edited

Loading

mattf Feb 25, 2025 •

edited

Loading