The virtual assistant application leverages LLM models, which use FastAPI for the backend and simple Tailwind CSS for the UI.
In the project, I used Langchain build to a conversation chain with memory and OpenAI's GPT-3.5-turbo-1106 model, as well as the fine-tuned GPT-3.5-turbo-1106 model for a specific documents/domain.
- Dataset generation and fine-tuning OpenAI's GPT for a specific documents/domain in models
- Q/A chat with conversation memory using Langchain and OpenAI's GPT models.
- FastAPI websocket endpoint for the backend.
- Simple Tailwind CSS for the UI.
- Dockerized the application using Docker Compose.
llm-playground
├───data <- models downloaded for localLLM
├───models <- dataset generation and fine-tuning GPT models.
├───src <- chat application using FastAPI with a WebSocket endpoint to interact with the GPT models.
│ ├───api <- define router endpoints.
│ ├───integrations <- integrate LLM backends like OpenAI or LLaMA.cpp or Intel transformers.
│ ├───schemas <- define schemas used in the project.
│ ├───templates <- Tailwind CSS UI.
│ └───utils <- ultility scripts.
└───tests <- unittests for the project.
- Python 3.10
- Docker & Docker compose (Get Docker)
- Open AI API key
git clone https://github.com/michaelnguyen11/llm-playground.git
cd llm-playground
cp .env.template .env
# Edit your .env file
The easiest way to get started is using the docker-compose, it will Dockerized the application.
docker-compose build
docker-compose up -d
Then, navigate to http://0.0.0.0:8080
to chat with the Q/A virtual assistant
Note : The LlamaCpp backend have not supported using docker-compose yet. Reference, I will find a workaround later.
Encourage to run in the virtual environments like venv
or conda
. To install dependency packages, use the command :
pip install -r requirements.txt
To launch the server, use the command:
uvicorn src.main:app --host 0.0.0.0 --port 8080 --reload
Then, navigate to http://0.0.0.0:8080
to chat with the Q/A virtual assistant
To download models for LlamaCpp backend, navigate to data and run command :
./donwload_models.sh
Then change ENDPOINT_TYPE in .env
to use openai
or llamacpp
websocket endpoint backend.
- Improve the data generation pipeline
- Explore more methods to evaluate fine-tuned LLM models.
- Implement RAG with Langchain for augmenting LLM knowledge with additional data
- Multiple LLM backends for LocalLLM, optimized for specific Hardware
- llama-cpp-python : MacOS Platforms
- intel-extension-for-transformers : Intel Platforms
- TensorRT-LLM : Nvidia Platforms