Updates to the Project for Support of sqlcoder-7b and sqlcoder2-15b (#…

…222)
eosphoros-ai · Jan 27, 2024 · 81d69d9 · 81d69d9
1 parent 55be813
commit 81d69d9
Show file tree

Hide file tree

Showing 7 changed files with 1,468 additions and 1,259 deletions.
diff --git a/README.md b/README.md
@@ -290,14 +290,15 @@
 
 ## Contents
 - [DB-GPT-Hub: Text-to-SQL parsing with LLMs](#db-gpt-hub-text-to-sql-parsing-with-llms)
+  - [Baseline](#baseline)
   - [Contents](#contents)
   - [1. What is DB-GPT-Hub](#1-what-is-db-gpt-hub)
   - [2. Fine-tuning Text-to-SQL](#2-fine-tuning-text-to-sql)
     - [2.1. Dataset](#21-dataset)
     - [2.2. Model](#22-model)
   - [3. Usage](#3-usage)
     - [3.1. Environment preparation](#31-environment-preparation)
-    - [3.2. Quick Start](#32-quick-start)
+    - [3.2 Quick Start](#32-quick-start)
     - [3.3. Data preparation](#33-data-preparation)
     - [3.4. Model fine-tuning](#34-model-fine-tuning)
     - [3.5. Model Predict](#35-model-predict)
@@ -354,6 +355,9 @@ DB-GPT-Hub currently supports the following base models:
   - [x] ChatGLM2
   - [x] ChatGLM3
   - [x] internlm
+  - [x] sqlcoder-7b(mistral)
+  - [x] sqlcoder2-15b(starcoder)
+
 
 
 
@@ -522,6 +526,14 @@ deepspeed --num_gpus 2  dbgpt_hub/train/sft_train.py \
     --deepspeed dbgpt_hub/configs/ds_config.json \
     --quantization_bit 4 \
     ...
+```     
+
+if you need  order card  id   
+```
+deepspeed --include localhost:0,1  dbgpt_hub/train/sft_train.py \
+    --deepspeed dbgpt_hub/configs/ds_config.json \
+    --quantization_bit 4 \
+    ...
 ```    
 
 The other parts that are omitted (…) can be kept consistent. If you want to change the default deepseed configuration, go into the `dbgpt_hub/configs` directory and make changes to ds_config.json as needed,the default is stage2.   
@@ -533,17 +545,20 @@ In the script, during fine-tuning, different models correspond to key parameters
 | [LLaMA-2](https://huggingface.co/meta-llama)             | q_proj,v_proj   | llama2    |
 | [CodeLlama-2](https://huggingface.co/codellama/)         | q_proj,v_proj   | llama2    |
 | [Baichuan2](https://github.com/baichuan-inc/Baichuan2)   | W_pack          | baichuan2 |
-| [InternLM](https://github.com/InternLM/InternLM)         | q_proj,v_proj   | intern    |
 | [Qwen](https://github.com/QwenLM/Qwen-7B)                | c_attn          | chatml    |
+| [sqlcoder-7b](https://huggingface.co/defog/sqlcoder-7b)  | q_proj,v_proj   | mistral   |
+| [sqlcoder2-15b](https://huggingface.co/defog/sqlcoder2)  | c_attn          | default   |
+| [InternLM](https://github.com/InternLM/InternLM)         | q_proj,v_proj   | intern    |
 | [XVERSE](https://github.com/xverse-ai/XVERSE-13B)        | q_proj,v_proj   | xverse    |
 | [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)         | query_key_value | chatglm2  |
-| [ChatGLM3](https://github.com/THUDM/ChatGLM3-6B)         | query_key_value | chatglm3  |
 | [LLaMA](https://github.com/facebookresearch/llama)       | q_proj,v_proj   | -         |
 | [BLOOM](https://huggingface.co/bigscience/bloom)         | query_key_value | -         |
 | [BLOOMZ](https://huggingface.co/bigscience/bloomz)       | query_key_value | -         |
 | [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | W_pack          | baichuan  |
 | [Falcon](https://huggingface.co/tiiuae/falcon-7b)        | query_key_value | -         |
 
+
+
  In `train_sft.sh` , other key parameters are as follows:
 
  > quantization_bit: Indicates whether quantization is applied, with valid values being [4 or 8].   
@@ -609,6 +624,8 @@ The whole process we will divide into three phases:
   - [x] ChatGLM2
   - [x] ChatGLM3
   - [x] internlm
+  - [x] sqlcoder-7b(mistral)
+  - [x] sqlcoder2-15b(starcoder)
 
 * Stage 2:
   - [x] Optidmize model performance, and support fine-tuning more different models in various ways before  `20231010`

diff --git a/README.zh.md b/README.zh.md
@@ -289,20 +289,21 @@
 
 ## Contents
 - [DB-GPT-Hub:利用LLMs实现Text-to-SQL](#db-gpt-hub利用llms实现text-to-sql)
+  - [Baseline](#baseline)
   - [Contents](#contents)
   - [一、简介](#一简介)
   - [二、Text-to-SQL微调](#二text-to-sql微调)
     - [2.1、数据集](#21数据集)
     - [2.2、基座模型](#22基座模型)
   - [三、使用方法](#三使用方法)
     - [3.1、环境准备](#31环境准备)
-    - [3.2、快速开始](#32快速开始)
-    - [3.3、数据准备](#33数据准备)
-    - [3.4、模型微调](#34模型微调)
-    - [3.5、模型预测](#35模型预测)
-    - [3.6、模型权重](#36模型权重)
-      - [3.6.1 模型和微调权重合并](#361-模型和微调权重合并)
-    - [3.7、模型评估](#37模型评估)
+    - [3.2、数据准备](#32数据准备)
+    - [3.2 快速开始](#32-快速开始)
+    - [3.3、模型微调](#33模型微调)
+    - [3.4、模型预测](#34模型预测)
+    - [3.5、模型权重](#35模型权重)
+      - [3.5.1 模型和微调权重合并](#351-模型和微调权重合并)
+    - [3.6、模型评估](#36模型评估)
   - [四、发展路线](#四发展路线)
   - [五、贡献](#五贡献)
   - [六、感谢](#六感谢)
@@ -350,6 +351,9 @@ DB-GPT-HUB目前已经支持的base模型有：
   - [x] ChatGLM3
   - [x] internlm
   - [x] Falcon
+  - [x] sqlcoder-7b(mistral)
+  - [x] sqlcoder2-15b(starcoder)
+
 
 
 模型可以基于quantization_bit为4的量化微调(QLoRA)所需的最低硬件资源,可以参考如下：
@@ -513,6 +517,14 @@ deepspeed --num_gpus 2  dbgpt_hub/train/sft_train.py \
     --quantization_bit 4 \
     ...
 ```   
+如果需要指定对应的显卡id而不是默认的前两个如3,4，可以如下
+```
+deepspeed --include localhost:3,4  dbgpt_hub/train/sft_train.py \
+    --deepspeed dbgpt_hub/configs/ds_config.json \
+    --quantization_bit 4 \
+    ...
+```    
+
 其他省略(...)的部分均保持一致即可。 如果想要更改默认的deepseed配置，进入 `dbgpt_hub/configs` 目录，在ds_config.json 更改即可，默认为stage2的策略。
 
 脚本中微调时不同模型对应的关键参数lora_target 和 template，如下表：
@@ -522,8 +534,10 @@ deepspeed --num_gpus 2  dbgpt_hub/train/sft_train.py \
 | [LLaMA-2](https://huggingface.co/meta-llama)             | q_proj,v_proj   | llama2    |
 | [CodeLlama-2](https://huggingface.co/codellama/)         | q_proj,v_proj   | llama2    |
 | [Baichuan2](https://github.com/baichuan-inc/Baichuan2)   | W_pack          | baichuan2 |
-| [InternLM](https://github.com/InternLM/InternLM)         | q_proj,v_proj   | intern    |
 | [Qwen](https://github.com/QwenLM/Qwen-7B)                | c_attn          | chatml    |
+| [sqlcoder-7b](https://huggingface.co/defog/sqlcoder-7b)  | q_proj,v_proj   | mistral   |
+| [sqlcoder2-15b](https://huggingface.co/defog/sqlcoder2)  | c_attn          | default   |
+| [InternLM](https://github.com/InternLM/InternLM)         | q_proj,v_proj   | intern    |
 | [XVERSE](https://github.com/xverse-ai/XVERSE-13B)        | q_proj,v_proj   | xverse    |
 | [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B)         | query_key_value | chatglm2  |
 | [LLaMA](https://github.com/facebookresearch/llama)       | q_proj,v_proj   | -         |
@@ -532,6 +546,7 @@ deepspeed --num_gpus 2  dbgpt_hub/train/sft_train.py \
 | [Baichuan](https://github.com/baichuan-inc/baichuan-13B) | W_pack          | baichuan  |
 | [Falcon](https://huggingface.co/tiiuae/falcon-7b)        | query_key_value | -         |
 
+
 `train_sft.sh`中其他关键参数含义：
 > quantization_bit：是否量化，取值为[4或者8]   
 > model_name_or_path：  LLM模型的路径   
@@ -593,6 +608,8 @@ poetry run python dbgpt_hub/eval/evaluation.py --plug_value --input  Your_model_
   - [x] ChatGLM2
   - [x] ChatGLM3
   - [x] internlm    
+  - [x] sqlcoder-7b(mistral)
+  - [x] sqlcoder2-15b(starcoder)
 
 * 阶段二:
   - [x]  优化模型效果，支持更多不同模型进行不同方式的微调。截止`20231010`，我们已经完成对项目代码的重构，支持更多的模型。

diff --git a/assets/wechat.JPG b/assets/wechat.JPG
diff --git a/dbgpt_hub/data_process/data_utils.py b/dbgpt_hub/data_process/data_utils.py
@@ -223,6 +223,17 @@ def register_template(
     use_history=False,
 )
 
+r"""
+Supports language model for  mistral sqlcoder-7b
+"""
+register_template(
+    name="mistral",
+    prefix=["{{system}}"],
+    prompt=["[INST] {{query}} [/INST]"],
+    system="",
+    sep=[],
+)
+
 
 r"""
 Default template.

diff --git a/dbgpt_hub/scripts/train_sft.sh b/dbgpt_hub/scripts/train_sft.sh
@@ -5,11 +5,11 @@ train_log="dbgpt_hub/output/logs/train_sft_test_${current_date}.log"
 start_time=$(date +%s)
 echo " Train Start time: $(date -d @$start_time +'%Y-%m-%d %H:%M:%S')" >>${train_log}
 
-# # zero-shot
-# num_shot=0
+# default train , zero-shot, 
+num_shot=0
 
 # one-shot train
-num_shot=1
+# num_shot=1
 
 dataset="example_text2sql_train"
 if [ "$num_shot" -eq 1 ]; then