Skip to content

Commit

Permalink
Updates to the Project for Support of sqlcoder-7b and sqlcoder2-15b (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
wangzaistone authored Jan 27, 2024
1 parent 55be813 commit 81d69d9
Show file tree
Hide file tree
Showing 7 changed files with 1,468 additions and 1,259 deletions.
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,14 +290,15 @@

## Contents
- [DB-GPT-Hub: Text-to-SQL parsing with LLMs](#db-gpt-hub-text-to-sql-parsing-with-llms)
- [Baseline](#baseline)
- [Contents](#contents)
- [1. What is DB-GPT-Hub](#1-what-is-db-gpt-hub)
- [2. Fine-tuning Text-to-SQL](#2-fine-tuning-text-to-sql)
- [2.1. Dataset](#21-dataset)
- [2.2. Model](#22-model)
- [3. Usage](#3-usage)
- [3.1. Environment preparation](#31-environment-preparation)
- [3.2. Quick Start](#32-quick-start)
- [3.2 Quick Start](#32-quick-start)
- [3.3. Data preparation](#33-data-preparation)
- [3.4. Model fine-tuning](#34-model-fine-tuning)
- [3.5. Model Predict](#35-model-predict)
Expand Down Expand Up @@ -354,6 +355,9 @@ DB-GPT-Hub currently supports the following base models:
- [x] ChatGLM2
- [x] ChatGLM3
- [x] internlm
- [x] sqlcoder-7b(mistral)
- [x] sqlcoder2-15b(starcoder)




Expand Down Expand Up @@ -522,6 +526,14 @@ deepspeed --num_gpus 2 dbgpt_hub/train/sft_train.py \
--deepspeed dbgpt_hub/configs/ds_config.json \
--quantization_bit 4 \
...
```

if you need order card id
```
deepspeed --include localhost:0,1 dbgpt_hub/train/sft_train.py \
--deepspeed dbgpt_hub/configs/ds_config.json \
--quantization_bit 4 \
...
```

The other parts that are omitted (…) can be kept consistent. If you want to change the default deepseed configuration, go into the `dbgpt_hub/configs` directory and make changes to ds_config.json as needed,the default is stage2.
Expand All @@ -533,17 +545,20 @@ In the script, during fine-tuning, different models correspond to key parameters
| [LLaMA-2](https://huggingface.co/meta-llama) | q_proj,v_proj | llama2 |
| [CodeLlama-2](https://huggingface.co/codellama/) | q_proj,v_proj | llama2 |
| [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | W_pack | baichuan2 |
| [InternLM](https://github.com/InternLM/InternLM) | q_proj,v_proj | intern |
| [Qwen](https://github.com/QwenLM/Qwen-7B) | c_attn | chatml |
| [sqlcoder-7b](https://huggingface.co/defog/sqlcoder-7b) | q_proj,v_proj | mistral |
| [sqlcoder2-15b](https://huggingface.co/defog/sqlcoder2) | c_attn | default |
| [InternLM](https://github.com/InternLM/InternLM) | q_proj,v_proj | intern |
| [XVERSE](https://github.com/xverse-ai/XVERSE-13B) | q_proj,v_proj | xverse |
| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | query_key_value | chatglm2 |
| [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B) | query_key_value | chatglm3 |
| [LLaMA](https://github.com/facebookresearch/llama) | q_proj,v_proj | - |
| [BLOOM](https://huggingface.co/bigscience/bloom) | query_key_value | - |
| [BLOOMZ](https://huggingface.co/bigscience/bloomz) | query_key_value | - |
| [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | W_pack | baichuan |
| [Falcon](https://huggingface.co/tiiuae/falcon-7b) | query_key_value | - |



In `train_sft.sh` , other key parameters are as follows:

> quantization_bit: Indicates whether quantization is applied, with valid values being [4 or 8].
Expand Down Expand Up @@ -609,6 +624,8 @@ The whole process we will divide into three phases:
- [x] ChatGLM2
- [x] ChatGLM3
- [x] internlm
- [x] sqlcoder-7b(mistral)
- [x] sqlcoder2-15b(starcoder)

* Stage 2:
- [x] Optidmize model performance, and support fine-tuning more different models in various ways before `20231010`
Expand Down
33 changes: 25 additions & 8 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,20 +289,21 @@

## Contents
- [DB-GPT-Hub:利用LLMs实现Text-to-SQL](#db-gpt-hub利用llms实现text-to-sql)
- [Baseline](#baseline)
- [Contents](#contents)
- [一、简介](#一简介)
- [二、Text-to-SQL微调](#二text-to-sql微调)
- [2.1、数据集](#21数据集)
- [2.2、基座模型](#22基座模型)
- [三、使用方法](#三使用方法)
- [3.1、环境准备](#31环境准备)
- [3.2、快速开始](#32快速开始)
- [3.3、数据准备](#33数据准备)
- [3.4、模型微调](#34模型微调)
- [3.5、模型预测](#35模型预测)
- [3.6、模型权重](#36模型权重)
- [3.6.1 模型和微调权重合并](#361-模型和微调权重合并)
- [3.7、模型评估](#37模型评估)
- [3.2、数据准备](#32数据准备)
- [3.2 快速开始](#32-快速开始)
- [3.3、模型微调](#33模型微调)
- [3.4、模型预测](#34模型预测)
- [3.5、模型权重](#35模型权重)
- [3.5.1 模型和微调权重合并](#351-模型和微调权重合并)
- [3.6、模型评估](#36模型评估)
- [四、发展路线](#四发展路线)
- [五、贡献](#五贡献)
- [六、感谢](#六感谢)
Expand Down Expand Up @@ -350,6 +351,9 @@ DB-GPT-HUB目前已经支持的base模型有:
- [x] ChatGLM3
- [x] internlm
- [x] Falcon
- [x] sqlcoder-7b(mistral)
- [x] sqlcoder2-15b(starcoder)



模型可以基于quantization_bit为4的量化微调(QLoRA)所需的最低硬件资源,可以参考如下:
Expand Down Expand Up @@ -513,6 +517,14 @@ deepspeed --num_gpus 2 dbgpt_hub/train/sft_train.py \
--quantization_bit 4 \
...
```
如果需要指定对应的显卡id而不是默认的前两个如3,4,可以如下
```
deepspeed --include localhost:3,4 dbgpt_hub/train/sft_train.py \
--deepspeed dbgpt_hub/configs/ds_config.json \
--quantization_bit 4 \
...
```

其他省略(...)的部分均保持一致即可。 如果想要更改默认的deepseed配置,进入 `dbgpt_hub/configs` 目录,在ds_config.json 更改即可,默认为stage2的策略。

脚本中微调时不同模型对应的关键参数lora_target 和 template,如下表:
Expand All @@ -522,8 +534,10 @@ deepspeed --num_gpus 2 dbgpt_hub/train/sft_train.py \
| [LLaMA-2](https://huggingface.co/meta-llama) | q_proj,v_proj | llama2 |
| [CodeLlama-2](https://huggingface.co/codellama/) | q_proj,v_proj | llama2 |
| [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | W_pack | baichuan2 |
| [InternLM](https://github.com/InternLM/InternLM) | q_proj,v_proj | intern |
| [Qwen](https://github.com/QwenLM/Qwen-7B) | c_attn | chatml |
| [sqlcoder-7b](https://huggingface.co/defog/sqlcoder-7b) | q_proj,v_proj | mistral |
| [sqlcoder2-15b](https://huggingface.co/defog/sqlcoder2) | c_attn | default |
| [InternLM](https://github.com/InternLM/InternLM) | q_proj,v_proj | intern |
| [XVERSE](https://github.com/xverse-ai/XVERSE-13B) | q_proj,v_proj | xverse |
| [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | query_key_value | chatglm2 |
| [LLaMA](https://github.com/facebookresearch/llama) | q_proj,v_proj | - |
Expand All @@ -532,6 +546,7 @@ deepspeed --num_gpus 2 dbgpt_hub/train/sft_train.py \
| [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | W_pack | baichuan |
| [Falcon](https://huggingface.co/tiiuae/falcon-7b) | query_key_value | - |


`train_sft.sh`中其他关键参数含义:
> quantization_bit:是否量化,取值为[4或者8]
> model_name_or_path: LLM模型的路径
Expand Down Expand Up @@ -593,6 +608,8 @@ poetry run python dbgpt_hub/eval/evaluation.py --plug_value --input Your_model_
- [x] ChatGLM2
- [x] ChatGLM3
- [x] internlm
- [x] sqlcoder-7b(mistral)
- [x] sqlcoder2-15b(starcoder)

* 阶段二:
- [x] 优化模型效果,支持更多不同模型进行不同方式的微调。截止`20231010`,我们已经完成对项目代码的重构,支持更多的模型。
Expand Down
Binary file modified assets/wechat.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions dbgpt_hub/data_process/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,17 @@ def register_template(
use_history=False,
)

r"""
Supports language model for mistral sqlcoder-7b
"""
register_template(
name="mistral",
prefix=["{{system}}"],
prompt=["[INST] {{query}} [/INST]"],
system="",
sep=[],
)


r"""
Default template.
Expand Down
6 changes: 3 additions & 3 deletions dbgpt_hub/scripts/train_sft.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,11 @@ train_log="dbgpt_hub/output/logs/train_sft_test_${current_date}.log"
start_time=$(date +%s)
echo " Train Start time: $(date -d @$start_time +'%Y-%m-%d %H:%M:%S')" >>${train_log}

# # zero-shot
# num_shot=0
# default train , zero-shot,
num_shot=0

# one-shot train
num_shot=1
# num_shot=1

dataset="example_text2sql_train"
if [ "$num_shot" -eq 1 ]; then
Expand Down
Loading

0 comments on commit 81d69d9

Please sign in to comment.