Ingest larger documents (over 800MB) #112

bru-singh · 2023-06-08T09:52:13Z

bru-singh
Jun 8, 2023

I'm running ingest.py for ingesting a txt file containing question and answer pairs, it is over 800MB (I know it's a lot). I am using the instruct-xl as the embedding model to ingest. Any approximate idea for how long will it take to complete the ingest process.

Loads all documents from the source documents directory, ignoring specified files
Loading new documents: 100%|██████████████████████| 1/1 [00:11<00:00, 11.11s/it]
Loaded 1 new documents from //SOURCE_DOCUMENTS LOCATION
Split into 981498 chunks of text (max. 1000 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: DB

PromtEngineer · 2023-06-08T22:54:30Z

PromtEngineer
Jun 8, 2023
Maintainer

I am curious what system you are running this on 😃 These are a lot of chunks. Curious if this is finished.

1 reply

bru-singh Jun 15, 2023
Author

Im currently using a system with 16GB RAM, and 3 nvidia TESLA GPUs (16GBs each) but the localGPT only took 4-5 GB of GPU space.
it took some time to finish the ingesting (I kept it running for a day). But the inferences from the model takes some time. Any way to reduce that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest larger documents (over 800MB) #112

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Ingest larger documents (over 800MB) #112

bru-singh Jun 8, 2023

Replies: 1 comment · 1 reply

PromtEngineer Jun 8, 2023 Maintainer

bru-singh Jun 15, 2023 Author

bru-singh
Jun 8, 2023

Replies: 1 comment 1 reply

PromtEngineer
Jun 8, 2023
Maintainer

bru-singh Jun 15, 2023
Author