This repository is all about my LLM journey: how to use, fine-tuning, and build inference services with LLM.
It's challenging but interesting .
- Chapter 1: Run LLAMA2 model with Meta official python code
- Chapter 2: Run LLAMA2 model with HuggingFace transformers
- Chapter 3: Embedding your PDF and send it to LLM.
- Chapter 4: Store moer embedding data with vector database Qdrant
- Chapter 5 Use vLLM build a inference service like openai chatGPT
- Chapter 6: Fine tuning your own model use autotrain-advanced
- Reference
Prerequirements:
- Request LLAMA2 access permission and download it , or use other LLAMA2 compatible model, etc. Llama2-Chinese-7b-Chat
- A GPU machine that already install nvidia driver , CUDA , I preferred to use AWS EC2 g5.xlarge instance, or other more than 24G GPU memory instance.
- Some python coding and docker skill .
This is chapter 1.
This is chapter 2.
This is the conclusion.