PDF - Search

Requirements

In a fresh virtual environment run

$ pip3 install -r requirements.txt

Go to openai.com/api and create an API token. Create the following environment variable.

$ export OPEN_AI_KEY="<your openai api token>"

How to use

In the project root folder run the following command

$ python3 main.py

You should receive the following prompt

$ Welcome to your source of infinite knowledge


Select an option please:
1 to read a book,
2 to query
3 to exit

Select 1 and give the address of a .pdf

$ 1
$Enter the location of your book: ./books/any.pdf

After being prompted to select an option select

$ 2
$ Enter your query: <your query here>

Known limitations

No error handling, if an error occurs, you just need to run the program again and repeat the steps
Your pdf's need to have a table of contents and being subdivided by chapters.
The preprocess.py script divides the books by chapters located in the PDF's xref table if your chapters are longer than 4096 open api tokens or about 3000 words you will get the following error

$ openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens, however you requested 6711 tokens (6211 in your prompt; 500 for the completion). Please reduce your prompt; or completion length.

In that case you have two options

You can choose a book/pdf with shorter chapters for example Kimball's data warehouse toolkit
You can modify preprocess.py to account for shorter amount of text when splitting the chapters. Issuing a PR would be cool :P

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
gpt.py		gpt.py
main.py		main.py
preprocess.py		preprocess.py
qa_pipeline.py		qa_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF - Search

Requirements

How to use

Known limitations

About

Releases

Packages

Languages

raynerz/pdf-search

Folders and files

Latest commit

History

Repository files navigation

PDF - Search

Requirements

How to use

Known limitations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages