Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AGIEval #79

Closed
lewtun opened this issue Mar 2, 2024 · 2 comments · Fixed by #121
Closed

Add AGIEval #79

lewtun opened this issue Mar 2, 2024 · 2 comments · Fixed by #121
Assignees
Labels

Comments

@lewtun
Copy link
Member

lewtun commented Mar 2, 2024

AGIEval is a popular set of benchmarks that was popularised by Teknium/Nous in models like OpenHermes. It would be nice to include in lighteval so we can compare internally how our models stack up on this benchmark :)

Ref command from AutoEval:

    benchmark="agieval"
    python main.py \
        --model hf-causal \
        --model_args pretrained=$MODEL_ID,trust_remote_code=$TRUST_REMOTE_CODE \
        --tasks agieval_aqua_rat,agieval_logiqa_en,agieval_lsat_ar,agieval_lsat_lr,agieval_lsat_rc,agieval_sat_en,agieval_sat_en_without_passage,agieval_sat_math \
        --device cuda:$cuda_devices \
        --batch_size auto \
        --output_path ./${benchmark}.json
@clefourrier
Copy link
Member

Would you need AGIEval or BBH first?

@lewtun
Copy link
Member Author

lewtun commented Mar 2, 2024

Would you need AGIEval or BBH first?

Maybe we can do BBH first since you already have made a big dent in it in #7 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants