EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2.2k
Star 8k

Code
Issues 374
Pull requests 109
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals

#2557 opened Dec 10, 2024 by baberabb

Open 6

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

374 Open 893 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Smooth landing errors during post processing

#2751 opened Feb 28, 2025 by ksurya

Embedding checkpoint size mismatch when using peft on DeepSeek-R1-Distill-Qwen-1.5B.

#2748 opened Feb 28, 2025 by Phoenix-Shen

Gemini Support and usage

#2747 opened Feb 27, 2025 by IsraelAbebe

HOW TO ADD NEW TASK?

#2745 opened Feb 27, 2025 by amdslgl

modelscope installed will lead some problems

#2744 opened Feb 27, 2025 by jijivski

Error loading MMLU 'prehistory' config: BuilderConfig not found (available: ['default'])

#2743 opened Feb 27, 2025 by ruio248

Creating a new task with data in chat format (openai)

#2741 opened Feb 26, 2025 by leandermaben

An error occurred: 'choices' (in openai chat completion)

#2740 opened Feb 26, 2025 by Raghadalr02

Issue with the Tokenizer of Pixtral-12B-2409

#2731 opened Feb 24, 2025 by aminfarajian

Batching and generate_until special tokens

#2723 opened Feb 21, 2025 by sjmielke

Get acc_norm for HF models in log_samples feature request

A feature that isn't implemented yet.

#2722 opened Feb 21, 2025 by Kartik21

How to preprocess a document with the assistance of a tokenizer from a specific Model

#2717 opened Feb 20, 2025 by p1nksnow

Different models on same tasks gives same results when cache is active bug

Something isn't working.

#2715 opened Feb 19, 2025 by salvatore-cipolla

Importing a local module in a task included with include_path

#2713 opened Feb 19, 2025 by joaormfsilva

[Accuracy gap with official model card due to wrong parsing]

#2707 opened Feb 17, 2025 by Monstertail

Inconsistent Behavior with max_tokens, Post-Processing, and Cache Options

#2702 opened Feb 15, 2025 by ntlm1686

vLLM CUDA OOM for loglikelihood, but not for generate_until asking questions

For asking for clarification / support on library usage.

#2698 opened Feb 14, 2025 by lsjlsj5846

Feature request: allow peft revision separate from base model revision

#2696 opened Feb 13, 2025 by iuliaturc

Support Arabic Dataset

#2693 opened Feb 13, 2025 by ziadwaelai

Strip the input for the three tasks: FDA, SWDE, and SQuAD_completion. validation

For validation of task implementations.

#2690 opened Feb 12, 2025 by Doraemonzzz

Eval support for DeepSeek-R1 like reasoning models

#2682 opened Feb 9, 2025 by Nithanaroy

ValueError: Trying to set a tensor of shape torch.Size([896, 768]) in "weight" (which has shape torch.Size([896, 4864])), this looks incorrect

#2677 opened Feb 7, 2025 by aqe670

add_bos_token causes very unstable results for quantized llama3-70B asking questions

For asking for clarification / support on library usage.

#2676 opened Feb 7, 2025 by wenhuach21

Use AWS Bedrock Models

#2669 opened Feb 3, 2025 by nrcoleman

Support processor_kwargs for hf-multimodal

#2666 opened Jan 30, 2025 by nikg4

Previous 1 2 3 4 5 … 14 15 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly