-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Task definitions as json strings #145
base: main
Are you sure you want to change the base?
Conversation
task: LeaderboardTask, | ||
) -> bool: | ||
# Ask the user to select GPUs | ||
view = GPUSelectionView([gpu.name for gpu in GitHubGPU] + [gpu.name for gpu in ModalGPU]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not feedback specific to this PR but one idea is we make the default view aggregate both the scheduler and the kernel on the same leaderboard. It'll be similar to F1 racing where the winner is a combination of the best car + best driver
When announcing winners we could then give prizes per scheduler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msaroufim Do we have a sense for what other schedulers there will be? Our current schedulers (Modal vs. GH runners) don't really have overlap on devices in the first place.
task = LeaderboardTask.from_str(res[3]) | ||
except json.JSONDecodeError: | ||
logging.error("json decoding error in LB %s. Legacy task?", leaderboard_name) | ||
task = build_from_legacy_reference(res[3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy with BC breakages until we officially launch
@@ -0,0 +1,122 @@ | |||
import copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@@ -0,0 +1,138 @@ | |||
#include <chrono> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this file be in this folder? or are you envisioning that people copy paste some version of this per kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this "demo", yes. for the reference-kernels repo, we'd have one master copy, and then symlink that into the individual task directories. If we make minor fixes, they automatically propagate to existing tasks, if we want to do breaking changes, we can make a copy and symlink that into any new task, leaving existing stuff working.
34bb1d5
to
e7da7c9
Compare
e7da7c9
to
7aaa97e
Compare
Store generic task definitions as json string in the database (abusing the existing reference_code field for now).
This allows us to easily define multi-file tasks, have different eval scripts for different tasks (even though its possible, in general we should avoid that, I think)
Currently, I've copied (and then adapted) the file, but once we have the tasks repo, we can just symlink the eval.cu master copy into the individual tasks so we won't have to maintain multiple copies (with the option to make breaking changes for newer tasks without having to worry about existing ones, yay)
The identity exampe shows roughly how I envision task definitions to look like. In particular, we have a task.h that just defines the interface, and then we won't have to include the submission in the main file, and instead can compile it separately. We could also separate out the reference code, not sure if we want to, though.
To facilitate development and testing, I've added a command that creates a leaderboard from a local directory. It can overwrite an existing leaderboard, so you can iterate quickly.
I've added some translation code that attempts to take leaderboards that are still in their current format and map them to the new one. That code is pretty much untested.