Add AGIEval #79

lewtun · 2024-03-02T14:13:22Z

AGIEval is a popular set of benchmarks that was popularised by Teknium/Nous in models like OpenHermes. It would be nice to include in lighteval so we can compare internally how our models stack up on this benchmark :)

Ref paper: https://arxiv.org/abs/2304.06364
Ref code: https://github.com/dmahan93/lm-evaluation-harness/tree/add-agieval

Ref command from AutoEval:

    benchmark="agieval"
    python main.py \
        --model hf-causal \
        --model_args pretrained=$MODEL_ID,trust_remote_code=$TRUST_REMOTE_CODE \
        --tasks agieval_aqua_rat,agieval_logiqa_en,agieval_lsat_ar,agieval_lsat_lr,agieval_lsat_rc,agieval_sat_en,agieval_sat_en_without_passage,agieval_sat_math \
        --device cuda:$cuda_devices \
        --batch_size auto \
        --output_path ./${benchmark}.json

The text was updated successfully, but these errors were encountered:

clefourrier · 2024-03-02T15:30:34Z

Would you need AGIEval or BBH first?

lewtun · 2024-03-02T19:23:08Z

Would you need AGIEval or BBH first?

Maybe we can do BBH first since you already have made a big dent in it in #7 ?

clefourrier added the new task label Mar 2, 2024

clefourrier self-assigned this Mar 19, 2024

clefourrier mentioned this issue Mar 20, 2024

Add AGIEval #121

Merged

clefourrier closed this as completed in #121 Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AGIEval #79

Add AGIEval #79

lewtun commented Mar 2, 2024

clefourrier commented Mar 2, 2024

lewtun commented Mar 2, 2024

Add AGIEval #79

Add AGIEval #79

Comments

lewtun commented Mar 2, 2024

clefourrier commented Mar 2, 2024

lewtun commented Mar 2, 2024