-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update python client example with ingestor interface #380
Conversation
" }\n", | ||
" )\n", | ||
" .embed()\n", | ||
" .vdb_upload(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the changes we made, these are no longer valid arguments for this task. We need to change these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should we be showing how to perform a search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think adding a search could be helpful. How should i be using the VDBUpload task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if you want to use defaults, it doesnt require any parameters .vdb_upload()
. Otherwise we would add parameters similar to those in this script
ingestor = (
Ingestor(message_client_hostname=nv_ingest_service_host)
.files("/raid/workspace/data/multimodalPDFBluePrintData226/PMC3100084_PM2011-517687.pdf")
.extract(
extract_text=True,
extract_tables=True,
extract_charts=True,
extract_images=False,
text_depth="page"
).split(
split_by="word",
split_length=300,
split_overlap=10,
max_character_length=5000,
sentence_window_size=0,
).embed(text=True, tables=True
# ).vdb_upload(collection_name="text", milvus_uri=f"http://{mivlus_hostname}:19530", sparse=sparse, minio_endpoint="minio:9000")
)
results = ingestor.ingest()
Description
This updates the python client example notebook with the ingestor interface
Checklist
Closes #378