We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug Google Docs/Sheets/Slides not working in the V2 SDK Google Drive source connector
To Reproduce
runner = GoogleDriveRunner( processor_config=ProcessorConfig( verbose=True, output_dir=os.environ['GOOGLE_DRIVE_OUTPUT'], num_processes=2, ), read_config=ReadConfig(), partition_config=PartitionConfig( partition_by_api=True, api_key=os.getenv("UNSTRUCTURED_API_KEY") ), connector_config=SimpleGoogleDriveConfig( access_config=GoogleDriveAccessConfig( service_account_key=os.getenv("GOOGLE_DRIVE_ACCOUNT_KEY") ), recursive=True, drive_id=os.getenv("GOOGLE_DRIVE_FOLDER_ID"), ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( provider="langchain-openai", api_key=os.getenv("OPENAI_API_KEY"), ), writer=get_writer(), writer_kwargs={}, )
Expected behavior As in V1, I expect the file to be parsed
KeyError Traceback (most recent call last) in <cell line: 1>() 33 stager_config=WeaviateUploadStagerConfig(), 34 uploader_config=WeaviateUploaderConfig(), ---> 35 ).run()
7 frames /usr/local/lib/python3.10/dist-packages/unstructured/ingest/v2/processes/connectors/google_drive.py in map_file_data(f) 131 file_id = f["id"] 132 filename = f.pop("name") --> 133 url = f.pop("webContentLink") 134 version = f.pop("version", None) 135 permissions = f.pop("permissions", None)
KeyError: 'webContentLink'
Environment Info This doesn't only happen in my env but also for anyone else that tries this snippet
Additional context Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Same thing happens to me when trying to parse a GDrive word document with some tables, images, TOC, header, footer, etc. about 30 pages long.
Sorry, something went wrong.
is anyone getting this issue in google drive v2 ingestion ?
2024-11-19 22:12:47,003 SpawnProcess-18 ERROR C:\Users\SANTHOSH\.cache\unstructured\ingest\pipeline\index\34b4026053f1.json: [download] 'GoogleDriveDownloader' object has no attribute 'meta'
Thanks for reporting that @SantoshKumarRavi! It's a bug (see this line). We need to prepare a fix for that.
No branches or pull requests
Describe the bug
Google Docs/Sheets/Slides not working in the V2 SDK Google Drive source connector
To Reproduce
Ingesting from Google Drive, partitioning via Unstructured API, embedding via OpenAI,and writing to AstraDB
runner = GoogleDriveRunner(
processor_config=ProcessorConfig(
verbose=True,
output_dir=os.environ['GOOGLE_DRIVE_OUTPUT'],
num_processes=2,
),
read_config=ReadConfig(),
partition_config=PartitionConfig(
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY")
),
connector_config=SimpleGoogleDriveConfig(
access_config=GoogleDriveAccessConfig(
service_account_key=os.getenv("GOOGLE_DRIVE_ACCOUNT_KEY")
),
recursive=True,
drive_id=os.getenv("GOOGLE_DRIVE_FOLDER_ID"),
),
chunking_config=ChunkingConfig(chunk_elements=True),
embedding_config=EmbeddingConfig(
provider="langchain-openai",
api_key=os.getenv("OPENAI_API_KEY"),
),
writer=get_writer(),
writer_kwargs={},
)
Expected behavior
As in V1, I expect the file to be parsed
Screenshots
KeyError Traceback (most recent call last)
in <cell line: 1>()
33 stager_config=WeaviateUploadStagerConfig(),
34 uploader_config=WeaviateUploaderConfig(),
---> 35 ).run()
7 frames
/usr/local/lib/python3.10/dist-packages/unstructured/ingest/v2/processes/connectors/google_drive.py in map_file_data(f)
131 file_id = f["id"]
132 filename = f.pop("name")
--> 133 url = f.pop("webContentLink")
134 version = f.pop("version", None)
135 permissions = f.pop("permissions", None)
KeyError: 'webContentLink'
Environment Info
This doesn't only happen in my env but also for anyone else that tries this snippet
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: