Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Comms Benchmark Timing #833

Merged
merged 3 commits into from
Dec 20, 2023

Conversation

Quentin-Anthony
Copy link
Contributor

@Quentin-Anthony Quentin-Anthony commented Dec 19, 2023

As reported to me by @stas00, using the time-based calls incorrectly reports the duration of comm ops when moving to many nodes. This PR moves all benchmarks to CUDA events so that we can directly time on the stream.

@awan-10 @tjruwase

@tjruwase
Copy link
Contributor

@Quentin-Anthony, thanks for the PR. Out of curiosity, have you previously tried the DS timers which leverage cuda events to support multi-stream execution.

The above is not a requirement for this PR.

@Quentin-Anthony
Copy link
Contributor Author

@Quentin-Anthony, thanks for the PR. Out of curiosity, have you previously tried the DS timers which leverage cuda events to support multi-stream execution.

The above is not a requirement for this PR.

I have not. My rationale for not including them is that I want these benchmarks to be runnable without DeepSpeed installed for pure torch runs.

Despite this, it appears import deepspeed calls have snuck in such as in https://github.com/microsoft/DeepSpeedExamples/blob/master/benchmarks/communication/all_gather.py#L14, which I'll resolve in a followup PR. Keeping these benchmarks deepspeed-agnostic is important for pure-pytorch comms benchmarking

@stas00
Copy link
Contributor

stas00 commented Dec 20, 2023

yes, please, let's try to keep deepspeed optional ;) ideally it should work out of the box with just pytorch installed.

But then it might be a good idea to split this into its own repo? I think that way a lot more people might use it - otherwise having it buried inside this repo isn't the best location, IMHO

@tjruwase
Copy link
Contributor

@Quentin-Anthony, thanks for clarification. Sounds great to me.

@tjruwase
Copy link
Contributor

But then it might be a good idea to split this into its own repo? I think that way a lot more people might use it - otherwise having it buried inside this repo isn't the best location, IMHO

@Quentin-Anthony, I think this is a great idea. Currently this super useful utility is somewhat buried here. And it does not seem to be a true DS example, i.e., it is not demonstrating some DS functionality.

@tjruwase tjruwase merged commit 8e4cdd8 into deepspeedai:master Dec 20, 2023
2 checks passed
stceum pushed a commit to stceum/DeepSpeedExamples that referenced this pull request Jan 27, 2024
* Change to cuda event-based timing

* Add event args to called funcs

* Add missing comma to args
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants