[Major] conversational prompting

RUCAIBox · May 24, 2024 · 7e6ea5c · 7e6ea5c
1 parent 43b0689
commit 7e6ea5c
Show file tree

Hide file tree

Showing 13 changed files with 872 additions and 342 deletions.
diff --git a/README.md b/README.md
@@ -20,11 +20,11 @@ Training
 
 Utilization
 
-- **Comprehensive Evaluation:** We support 51 commonly used datasets.
-- **In-Context Learning:** We support various ICL strategies, including `KATE`, `GlobalE`, and `APE`.
-- **Chain-of-Thought:** For some datasets, we support three types of CoT evaluation: `base`, `least-to-most`, and `pal`.
-- **Evaluation Methods:** We currently support three evaluation methods for multiple choice questions or generation questions.
-- **Prefix Caching:** By caching the `past_key_value` of prefix, we can speed up local inference by up to 6x.
+- **Comprehensive Evaluation:** 56+ commonly used datasets and benchmarks in evaluating LLMs.
+- **Evaluation Methods:** Accurately reproduce results from original papers of OpenAI, LLaMA, Mistral, and other models.
+- **In-Context Learning:** We support various ICL strategies, including [`KATE`](https://aclanthology.org/2022.deelio-1.10/), [`GlobalE`](https://aclanthology.org/2022.acl-long.556/), and [`APE`](https://arxiv.org/abs/2211.01910).
+- **Chain-of-Thought:** For some datasets, we support three types of CoT evaluation: `base`, [`least-to-most`](https://arxiv.org/abs/2205.10625), and [`pal`](https://arxiv.org/abs/2211.10435).
+- **Prefix Caching:** By managing the KV Cache of prefixes, we can speed up local inference by up to 6x.
 - **vLLM and Flash Attention Support:** We also support [`vLLM`](https://github.com/vllm-project/vllm) and [`Flash Attention`](https://github.com/Dao-AILab/flash-attention) for efficient inference.
 - **Quantization:** BitsAndBytes and GPTQ quantization are supported.
 
@@ -85,40 +85,43 @@ Alternatively, you can use the following preset bash scripts to train your model
 
 ### Merging Tokenizer
 
-If you want to pre-train your models on corpora with languages or tokens not well-supported in original language mdoels(e.g., LLaMA), we provide the tokenizer merging function to expand the vocabulary based on the corpora by using [sentencepiece](https://github.com/google/sentencepiece). You can check [merge_tokenizer.py](training/merge_tokenizer.py) for detailed information. Please follow the guide in [Pre-train](training/README.md##2-continual-pre-training-with-your-own-corpora).
+If you want to pre-train your models on corpora with languages or tokens not well-supported in original language mdoels(e.g., LLaMA), we provide the tokenizer merging function to expand the vocabulary based on the corpora by using [sentencepiece](https://github.com/google/sentencepiece). You can check [merge_tokenizer.py](training/merge_tokenizer.py) for detailed information. Please follow the guide in [Pre-train](https://github.com/RUCAIBox/LLMBox/tree/main/training#2-continual-pre-training-with-your-own-corpora).
 
 ```bash
 bash bash/run_7b_pt.sh
 ```
 
 ### Merging Datasets
 
-If you want to train your models with a mix of multiple datasets, you can pass a list of dataset files or names to LLMBox. LLMBox will transfer each file or name into a PTDataset or SFTDataset, and merge them together to construct a combined dataset. You can also set the merging ratio of each dataset by passing a list of floats to LLMBox. Please follow the guide in [Merge Dataset](training/README.md##3-merging-different-datasets-with-designated-ratios-for-training).
+If you want to train your models with a mix of multiple datasets, you can pass a list of dataset files or names to LLMBox. LLMBox will transfer each file or name into a PTDataset or SFTDataset, and merge them together to construct a combined dataset. You can also set the merging ratio of each dataset by passing a list of floats to LLMBox. Please follow the guide in [Merge Dataset](https://github.com/RUCAIBox/LLMBox/tree/main/training#3-merging-different-datasets-with-designated-ratios-for-training).
 
 ```bash
 bash bash/run_7b_hybrid.sh
 ```
 
 ### Self-Instruct and Evol-Instruct
 
-Since manually creating instruction data of high qualities to train the model is very time-consuming and labor-intensive, Self-Instruct and Evol-Instruct are proposed to create large amounts of instruction data with varying levels of complexity using LLM instead of humans. LLMBox support both Self-Instruct and Evol-Instruct to augment or enhance the input data files. Please follow the guide in [Self-Insturct and Evol-Instruct](training/README.md#8-self-instruct-and-evol-instruct-for-generation-instructions)
+Since manually creating instruction data of high qualities to train the model is very time-consuming and labor-intensive, Self-Instruct and Evol-Instruct are proposed to create large amounts of instruction data with varying levels of complexity using LLM instead of humans. LLMBox support both Self-Instruct and Evol-Instruct to augment or enhance the input data files. Please follow the guide in [Self-Insturct and Evol-Instruct](https://github.com/RUCAIBox/LLMBox/tree/main/training#8-self-instruct-and-evol-instruct-for-generation-instructions)
 
 ```bash
 python self_instruct/self_instruct.py --seed_tasks_path=seed_tasks.jsonl
 ```
 
-For more details, view the [training](./training/README.md) documentation.
+For more details, view the [training](https://github.com/RUCAIBox/LLMBox/tree/main/training) documentation.
 
 ## Utilization
 
-We provide a broad support on Huggingface models, OpenAI, Anthropic, QWen and  models for further utilization. Currently a total of 51 commonly used datasets are supported, including: `HellaSwag`, `MMLU`, `GSM8K`, `AGIEval`, `CEval`, and `CMMLU`. For a full list of supported models and datasets, view the [utilization](./utilization/README.md) documentation.
+We provide a broad support on Huggingface models (e.g. `LLaMA-3`, `Mistral`), OpenAI, Anthropic, QWen and  other OpenAI-compatible models for further utilization.
+
+Currently a total of 56+ commonly used datasets are supported, including: `HellaSwag`, `MMLU`, `GSM8K`, `GPQA`, `AGIEval`, `CEval`, and `CMMLU`. For a full list of supported models and datasets, view the [utilization](https://github.com/RUCAIBox/LLMBox/tree/main/utilization) documentation.
 
 ```bash
-CUDA_VISIBLE_DEVICES=0 python inference.py \
+python inference.py \
   -m llama-2-7b-hf \
   -d mmlu agieval:[English] \
   --model_type instruction \
   --num_shot 5 \
+  --cuda 0 \
   --ranking_type ppl_no_option
 ```
 
@@ -243,7 +246,7 @@ python inference.py -m model -d dataset --kate  # --globale or --ape
 python inference.py -m model -d dataset --cot least_to_most  # --base or --pal
 ```
 
-For a more detailed instruction on model utilization, view the [utilization](./utilization/README.md) documentation.
+For a more detailed instruction on model utilization, view the [utilization](https://github.com/RUCAIBox/LLMBox/tree/main/utilization) documentation.
 
 <!-- For a full list of evaluation results, view our paper. -->
 
@@ -255,12 +258,14 @@ We welcome all contributions from bug fixes to new features and extensions.
 
 We expect all contributions discussed in the issue tracker and going through PRs.
 
+You can follow [model customization](https://github.com/RUCAIBox/LLMBox/tree/main/utilization#customize-model) and [dataset customization](https://github.com/RUCAIBox/LLMBox/tree/main/utilization#customize-dataset) to add new model provider or dataset.
+
 Make sure to format your code with `yapf --style .style.cfg` and `isort` before submitting a PR.
 
 
 ## The Team
 
-LLMBox is developed and maintained by [AI Box](http://aibox.ruc.edu.cn/).
+LLMBox is developed and maintained by [AI Box](http://aibox.ruc.edu.cn/). See more details in [change log](https://github.com/RUCAIBox/LLMBox/tree/main/utilization#change-log)
 
 ## License
 

diff --git a/utilization/README.md b/utilization/README.md
@@ -9,8 +9,10 @@
     - [Evaluation Arguments](#evaluation-arguments)
   - [Supported Models](#supported-models)
   - [Customize Model](#customize-model)
+  - [Customize Chat Template](#customize-chat-template)
   - [Supported Datasets](#supported-datasets)
   - [Customize Dataset](#customize-dataset)
+  - [Change Log](#change-log)
 
 ## Usage
 
@@ -123,7 +125,7 @@ Generation arguments and quantization options:
 --system_prompt SYSTEM_PROMPT, -sys SYSTEM_PROMPT
                       The system prompt for chat-based models
 --chat_template CHAT_TEMPLATE
-                      The chat template for huggingface chat-based models
+                      The chat template for local chat-based models. Support model default chate template (choose from 'base', 'llama2', 'chatml', 'zephyr', 'phi3', 'llama3', ...) or standard HuggingFace tokenizers chat template
 --bnb_config BNB_CONFIG
                       JSON string for BitsAndBytesConfig parameters.
 --load_in_8bit [LOAD_IN_8BIT]
@@ -169,9 +171,8 @@ You can evaluate datasets sequentially in a single run when they require similar
 --example_set EXAMPLE_SET
                       The set name for demonstration, supporting slice,
                       e.g., train, dev, train[:10] (default: None)
---instance_format INSTANCE_FORMAT, -fmt INSTANCE_FORMAT
-                      The format to format the `source` and `target` for
-                      each instance (default: {source}{target})
+--instruction INSTRUCTION
+                      The format to format the instruction for each instance. Either f-string or jinja2 format is supported. E.g., 'Answer the following question: {question}\nAnswer:'"
 --num_shots NUM_SHOTS, -shots NUM_SHOTS
                       The few-shot number for demonstration (default: 0)
 --max_example_tokens MAX_EXAMPLE_TOKENS
@@ -385,6 +386,21 @@ class NewModel(Model):
 
 And then, you should register your model in the [`load`](model/load.py) file.
 
+## Customize Chat Template
+
+Chat templates are used to formatting conversational messages to text input for local chat-based models.
+
+```bash
+python inference.py -m Meta-Llama-3-8B-Instruct -d gsm8k --model_type chat --chat_template llama3 -shots 8 -sys "You are a helpful assistant."
+```
+
+You don't need to specify the chat template for hosted models.
+
+```bash
+python inference.py -m gpt-3.5-turbo -d gsm8k --model_type chat -shots 8 -sys "You are a helpful assistant."
+```
+
+You can customize the [chat template](https://github.com/RUCAIBox/LLMBox/blob/main/utilization/chat_templates.py) for local chat-based models. We provide a set of chat templates for different models. You can specify a jinja2 chat template with the `--chat_template` argument. It works in the same way as the [tokenizers](https://huggingface.co/docs/transformers/main/en/chat_templating).
 
 
 ## Supported Datasets
@@ -1053,3 +1069,11 @@ def format_instance(self, instance):
 To evaluate a pre-trained model that lacks instruction-following capabilities, you can provide an instruction explicitly by assigning a completion instruction to the model as follows: instruction = "{question}".
 
 See [`Dataset`](dataset/dataset.py) for more details.
+
+## Change Log
+
+- **May 24, 2024**: Chat format support including conversational few-shot and system prompts.
+- **May 10, 2024**: New instruction formatting using f-string and jinja2.
+- **May 7, 2024**: Bump openai and vllm version.
+- **Apr 16, 2024**: Full support for KV caching.
+- **March 18, 2024**: First release of LLMBox.
diff --git a/utilization/chat_templates.py b/utilization/chat_templates.py
@@ -0,0 +1,88 @@
+# sources: https://github.com/huggingface/chat-ui/blob/main/PROMPTS.md
+
+DEFAULT_CHAT_TEMPLATE = (
+    "{% macro add(role, msg) -%}"
+    "{{ seq[role + '_start'] }}"
+    "{{ msg | smart_space(auto_leading_space, seq[role + '_start']) }}"
+    "{{ seq[role + '_end'] }}"
+    "{%- endmacro %}"
+    "{% for message in messages %}"
+    "{{ add(message['role'], message['content']) }}"
+    "{% endfor %}"
+    "{% if add_generation_prompt %}"
+    "{{ seq['assistant_start'] }}"
+    "{% endif %}"
+)
+
+DEFAULT_CHAT_CONFIGS = {
+    "base": {
+        "system_start": "",
+        "system_end": "\n\n",
+        "user_start": "",
+        "user_end": "",
+        "assistant_start": "",
+        "assistant_end": "\n\n",
+        "auto_leading_space": True,
+        "default_stops": ["\n"],
+    },
+    "llama2": {
+        "system_start": "<s>[INST] <<SYS>>\n",
+        "system_end": "\n<</SYS>>\n\n",
+        "user_start": "",
+        "user_end": " [/INST] ",
+        "assistant_start": "",
+        "assistant_end": " </s><s>[INST] ",
+        "auto_leading_space": True,
+        "default_stops": [""],
+    },
+    "chatml": {
+        "system_start": "<|im_start|>system\n",
+        "system_end": "<|im_end|>\n",
+        "user_start": "<|im_start|>user\n",
+        "user_end": "<|im_end|>\n",
+        "assistant_start": "<|im_start|>assistant\n",
+        "assistant_end": "<|im_end|>\n",
+        "auto_leading_space": True,
+        "default_stops": ["<|im_end|>"],
+    },
+    "zephyr": {
+        "system_start": "<|system|>\n",
+        "system_end": "</s>\n",
+        "user_start": "<|user|>\n",
+        "user_end": "</s>\n",
+        "assistant_start": "<|assistant|>\n",
+        "assistant_end": "</s>\n",
+        "auto_leading_space": True,
+        "default_stops": ["</s>"],
+    },
+    "phi3": {
+        "system_start": "<|system|>\n",
+        "system_end": "<|end|>\n",
+        "user_start": "<|user|>\n",
+        "user_end": "<|end|>\n",
+        "assistant_start": "<|assistant|>\n",
+        "assistant_end": "<|end|>\n",
+        "auto_leading_space": True,
+        "default_stops": ["<|end|>"],
+    },
+    "llama3": {
+        "system_start": "<|start_header_id|>system<|end_header_id|>\n\n",
+        "system_end": "<|eot_id|>",
+        "user_start": "<|start_header_id|>user<|end_header_id|>\n\n",
+        "user_end": "<|eot_id|>",
+        "assistant_start": "<|start_header_id|>assistant<|end_header_id|>\n\n",
+        "assistant_end": "<|eot_id|>",
+        "auto_leading_space": True,
+        "default_stops": ["<|eot_id|>"],
+    },
+    "alpaca": {
+        "system_start": "### Input:\n",
+        "system_end": "\n\n",
+        "user_start": "### Instruction:\n",
+        "user_end": "\n\n",
+        "assistant_start": "### Response:\n",
+        "assistant_end": "\n\n",
+        "auto_leading_space": True,
+        "default_stops": ["###"],
+    }
+}