mmlu_en

Inference Script for MMLU

This project tested the performance of the relevant models on the MMLU dataset. The valid set and the test set contain 1.5K and 14.1K multiple choice questions respectively, covering 57 subjects.

In the following, we will introduce the prediction method for the MMLU dataset.

Data Preparation

Download the dataset from the path specified in official MMLU, and unzip the file to the data folder:

wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
tar xf data.tar

Runing the Evaluation Script

Run the following script:

model_path=path/to/chinese-mixtral
output_path=path/to/your_output_dir
data_path=path/to/mmlu-data

cd scripts/mmlu
python eval.py \
    --model_path ${model_path} \
    --data_dir ${data_path} \
    --save_dir ${output_path} \
    --load_in_4bit \
    --ntrain 5 \
    --use_flash_attention_2 \

Arguments

model_path: Path to the model to be evaluated (the full Chinese-Mixtral model or Chinese-Mixtral-Instruct model, not LoRA)
ntrain: Specifies the number of few-shot demos when few_shot=True (5-shot: ntrain=5, 0-shot: ntrain=0)
save_dir: Output path of results
do_test: Whether to evaluate on the valid or test set: evaluate on the valid set when do_test=False and on the test set when do_test=True
load_in_4bit : Loads the model in 4-bit quantization form
use_flash_attention_2 : Use flash-attn2 to accelerate inference, otherwise use SDPA to accelerate.

Evaluation Output

After the model prediction is completed, the last line of the output log will display the final score: Average accuracy: 0.651, and the generated directory 'save_dir/results' will store the decoded results of each subject.

中文文档

English Docs

Model Reconstruction
Model Quantization, Inference and Deployment
System Performance
Training Scripts
- Pre-training Scripts
- Instruction Fine-tuning Scripts
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly