-
Notifications
You must be signed in to change notification settings - Fork 43
mmlu_en
This project tested the performance of the relevant models on the MMLU dataset. The valid set and the test set contain 1.5K and 14.1K multiple choice questions respectively, covering 57 subjects.
In the following, we will introduce the prediction method for the MMLU dataset.
Download the dataset from the path specified in official MMLU, and unzip the file to the data
folder:
wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
tar xf data.tar
Run the following script:
model_path=path/to/chinese-mixtral
output_path=path/to/your_output_dir
data_path=path/to/mmlu-data
cd scripts/mmlu
python eval.py \
--model_path ${model_path} \
--data_dir ${data_path} \
--save_dir ${output_path} \
--load_in_4bit \
--ntrain 5 \
--use_flash_attention_2 \
-
model_path
: Path to the model to be evaluated (the full Chinese-Mixtral model or Chinese-Mixtral-Instruct model, not LoRA) -
ntrain
: Specifies the number of few-shot demos whenfew_shot=True
(5-shot:ntrain=5
, 0-shot:ntrain=0
) -
save_dir
: Output path of results -
do_test
: Whether to evaluate on the valid or test set: evaluate on the valid set whendo_test=False
and on the test set whendo_test=True
-
load_in_4bit
: Loads the model in 4-bit quantization form -
use_flash_attention_2
: Use flash-attn2 to accelerate inference, otherwise use SDPA to accelerate.
After the model prediction is completed, the last line of the output log will display the final score: Average accuracy: 0.651
, and the generated directory 'save_dir/results' will store the decoded results of each subject.
- Model Reconstruction
- Model Quantization, Inference and Deployment
- System Performance
- Training Scripts
- FAQ