This repo includes the source code for the developing a Question answering LLM app using the OpenAI API, Streamlit and Langchain. The Milvus lite database is used to store the generated RAPTOR index from three robotics textbooks given in the reference section.
You can access the app with your own OpenAI API key with this link to the Streamlit community cloud: Robotics_QA_Bot
Table of Contents
In this project, a question answering LLM streamlit chatbot is build from a RAPTOR index stored in the Milvus lite vector database from a collection of three textbooks related to robotics. The content extraction and chunking process of the textbooks are done using the file extract_and_chunk_embed.ipynb
and the combined textbook chunks are stored as Content_extraction_and_chunking_embed/combined_textbook_chunk_metadata.pkl
. Using the saved file, the RAPTOR indexing is done using the code in RAPTOR_indexing/raptor-index_final_kaggle.ipynb
.
For the summarization step, gpt-4o-mini
from OpenAI is used. The hierarchial tree structure is build with 5 depths or levels and stored as RAPTOR_indexing/rec_results_full.pkl
. This pickle file is used to build the Milvus lite vector database with SBERT Embeddings (multi-qa-MiniLM-L6-cos-v1
) and then implement the retrieval and Question answering bot creation in the file QA-Retrieval-final.ipynb
. For the Hybrid retriever BM25 with the vector store retriever is used. Further FlashRank is used for the reranking and for Query expansion,gpt-4o-mini
based Stepback prompting is used. More details about Stepback prompting can be found in the reference section. Finally, the QA model is build again using gpt-4o-mini
OpenAI API model. All of the three .ipynb
files can be run on Goggle Colab or kaggle for faster execution. The streamlit appliction is defined in the file app.py
and the utils for the application is defined in the file QA_utils.py
.
For running this application in the Streamlit cloud, in Kaggle notebook or locally you will need an OpenAI API key. For running it locally inside a virtual environment, you can use the below code to install the packages:
pip install -r requirements.txt
- For using the application, you will need to provide a API key from OpenAI. The application uses
gpt-4o-mini
model for the Question answering chain. After the api is provided, you can ask questions related to robotics to the application. The answer along with the sources(Book name and pages) are displayed as output from the application. After deleting the current question in the textfield, you ask the application more questions.
Distributed under the MIT License. See LICENSE.txt
for more information.