Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update python client example with ingestor interface #380

Merged
merged 3 commits into from
Feb 1, 2025

Conversation

ChrisJar
Copy link
Collaborator

Description

This updates the python client example notebook with the ingestor interface

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Closes #378

Copy link

copy-pr-bot bot commented Jan 27, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ChrisJar ChrisJar requested a review from jdye64 January 27, 2025 21:37
" }\n",
" )\n",
" .embed()\n",
" .vdb_upload(\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the changes we made, these are no longer valid arguments for this task. We need to change these.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should we be showing how to perform a search?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think adding a search could be helpful. How should i be using the VDBUpload task?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you want to use defaults, it doesnt require any parameters .vdb_upload(). Otherwise we would add parameters similar to those in this script

ingestor = ( 
    Ingestor(message_client_hostname=nv_ingest_service_host)
    .files("/raid/workspace/data/multimodalPDFBluePrintData226/PMC3100084_PM2011-517687.pdf")
    .extract(
        extract_text=True,
        extract_tables=True,
        extract_charts=True,
        extract_images=False,
        text_depth="page"
    ).split(
        split_by="word",
        split_length=300,
        split_overlap=10,
        max_character_length=5000,
        sentence_window_size=0,
    ).embed(text=True, tables=True
    # ).vdb_upload(collection_name="text", milvus_uri=f"http://{mivlus_hostname}:19530", sparse=sparse, minio_endpoint="minio:9000")
)
results = ingestor.ingest()

@ChrisJar ChrisJar requested a review from jperez999 January 30, 2025 23:18
@randerzander randerzander merged commit 981e06d into NVIDIA:main Feb 1, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: python_client_usage.ipynb is using legacy JobSpec code
5 participants