Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code for running the Eval Harness in t5x #10

Merged
merged 2 commits into from
Dec 6, 2021
Merged

Conversation

DanielHesslow
Copy link
Collaborator

Adding support for running the EleutherAI Evaulation Harness directly addressing issue #4.

Copy link
Member

@thomasw21 thomasw21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't read the entire code, but it would be nice to check that the whole pipeline works. Typically are you able to load one of the checkpoints to run evaluation. Otherwise awesome work!

num_partitions = 4
model_parallel_submesh = (2,1,1,1)

TASK_FEATURE_LENGTHS = {"inputs": 512, "targets": 114}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this at inference, like how do you make a sample fit inside this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what you mean, but fit as in fit in memory?

In that case I didn't play with it too much but since we're not storing grads and the batch size is small everything seems to work out fine even for the xxl. Just reduced it since we cant partition the small one 4 times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean the length of the features I should probably find a way to make sure that it's never truncated. The tasks I've looked at in the EH are quite short though so didn't seem to be an issue but should probably add an assert. Will be more of an issue if we look at few-shot instead of zero-shot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay I see, it automatically pads to those sequence lengths right? Concerning the truncation problem ... that's a good problem. We tried tracking the length of each task in this google sheet (shared internally). And it seems to be okay-ish to truncate (most samples will fit) race might be problematic though.

bigscience/gins/eval_harness.gin Outdated Show resolved Hide resolved
utils.RestoreCheckpointConfig:
path = %CHECKPOINT_PATH
mode = 'specific'
dtype = 'bfloat16'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saving them in float32 don't know if this impacts if you load a float32 checkpoint in bfloat16. I have some earlier checkpoints, if you could run inference on them that'd be awesome!

Copy link
Collaborator Author

@DanielHesslow DanielHesslow Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I don't think it should be an issue, since the training is in bfloat16 the inf should work as well. I'll check and see if it makes any difference though.

Sure, send me a path and I can test it.

@DanielHesslow
Copy link
Collaborator Author

Running on checkpoints is just a matter of doing the following.

python3 ${T5X_DIR}/t5x/eval_harness.py
--gin_file_="t5x/examples/t5/t5_1_1/small.gin"
--gin_file_="t5x/bigscience/gins/eval_harness.gin"
--gin.INFER_OUTPUT_DIR="'.'"
--gin.DROPOUT_RATE=0.0
--gin.CHECKPOINT_PATH="'gs://t5-data/pretrained_models/t5.1.1.lm100k.small/model.ckpt-1100000'"
--results_path /home/Danie/base_test.json

Not tested on our current checkpoints, but should just be a matter of changing the checkpoint path.

Copy link
Member

@thomasw21 thomasw21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on successful evaluation runs (that seemed coherent). Let's merge this to allow others to run evaluation more easily. Haven't had the time to review it though.

@DanielHesslow DanielHesslow merged commit b438d49 into main Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants