Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planner for PipeDream-2BW #57

Open
nict-wisdom opened this issue Sep 23, 2020 · 5 comments
Open

Planner for PipeDream-2BW #57

nict-wisdom opened this issue Sep 23, 2020 · 5 comments

Comments

@nict-wisdom
Copy link

In pipedream_2bw branch, we found the runtime that implements PipeDream-2BW.
However, no explanation is given about the planner.

Can we use the planner?

@deepakn94
Copy link
Collaborator

You can use this function for planning: https://github.com/msr-fiddle/pipedream/blob/pipedream_2bw/planner/planner.py#L33.

For the performance and memory cost functions, you might want to use direct measurements (from running a 100 or so iterations for the respective configuration).

@nict-wisdom
Copy link
Author

Thank you for your very quick answer!

I was wondering how I can get values for some arguments: computation_time_per_block, num_parameters_per_block, num_activations_per_block, and output_activation_size.

More specifically,

  • What are num_activations_per_block and output_activation_size?
  • From these arguments, you seem to assume that all blocks have the same values (same computation time, same number of parameters, etc). Is my understanding correct?

I appreciate if you could answer these questions.

@deepakn94
Copy link
Collaborator

num_activations_per_block is the size of the intermediate activations needed in a transformer block during training. output_activation_size is the size of the intermediate activations sent between workers. Note that you can get these by profiling your model.

And yes, we're assuming that these are transformer models where the transformer blocks are repeated some number of times.

@nict-wisdom
Copy link
Author

Thank you again for your kind support!
I understand that PipeDream-2BW assumes uniform layers.

I have a related question about PipeDream (the former version).
From my understanding from the paper, PipeDream can allocate different numbers of GPUs to stages (unlike PipeDream-2BW).
My question is whether the implementation supports such allocations.

When I try, the optimizer (optimizer_graph_hierarchical.py) actually produces such allocations.
However, the runtime is often blocked with such an allocation.
(One of the reasons is the gradient synchronization among processes in the same stage, but there must be some other reasons)
Moreover, I found the following comment:

TODO: don't current support uneven configurations.

Does the uneven configurations mean allocating different numbers of GPUs to stages?

When I set a certain amount of GPUs (8/16/32) to train resnet, most of generated configurations are blocked soon after training starts.
Could you tell me how we can solve it, or is it possible generate safe configurations?

@barrydoooit
Copy link

You can use this function for planning: https://github.com/msr-fiddle/pipedream/blob/pipedream_2bw/planner/planner.py#L33.

For the performance and memory cost functions, you might want to use direct measurements (from running a 100 or so iterations for the respective configuration).

May I know if this is still available for non-commercial usage now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants