Planner for PipeDream-2BW #57

nict-wisdom · 2020-09-23T00:59:49Z

In pipedream_2bw branch, we found the runtime that implements PipeDream-2BW.
However, no explanation is given about the planner.

Can we use the planner?

deepakn94 · 2020-09-23T01:37:31Z

You can use this function for planning: https://github.com/msr-fiddle/pipedream/blob/pipedream_2bw/planner/planner.py#L33.

For the performance and memory cost functions, you might want to use direct measurements (from running a 100 or so iterations for the respective configuration).

nict-wisdom · 2020-09-23T01:51:33Z

Thank you for your very quick answer!

I was wondering how I can get values for some arguments: computation_time_per_block, num_parameters_per_block, num_activations_per_block, and output_activation_size.

More specifically,

What are num_activations_per_block and output_activation_size?
From these arguments, you seem to assume that all blocks have the same values (same computation time, same number of parameters, etc). Is my understanding correct?

I appreciate if you could answer these questions.

deepakn94 · 2020-09-23T03:14:45Z

num_activations_per_block is the size of the intermediate activations needed in a transformer block during training. output_activation_size is the size of the intermediate activations sent between workers. Note that you can get these by profiling your model.

And yes, we're assuming that these are transformer models where the transformer blocks are repeated some number of times.

nict-wisdom · 2020-09-23T04:01:32Z

Thank you again for your kind support!
I understand that PipeDream-2BW assumes uniform layers.

I have a related question about PipeDream (the former version).
From my understanding from the paper, PipeDream can allocate different numbers of GPUs to stages (unlike PipeDream-2BW).
My question is whether the implementation supports such allocations.

When I try, the optimizer (optimizer_graph_hierarchical.py) actually produces such allocations.
However, the runtime is often blocked with such an allocation.
(One of the reasons is the gradient synchronization among processes in the same stage, but there must be some other reasons)
Moreover, I found the following comment:

TODO: don't current support uneven configurations.

Does the uneven configurations mean allocating different numbers of GPUs to stages?

When I set a certain amount of GPUs (8/16/32) to train resnet, most of generated configurations are blocked soon after training starts.
Could you tell me how we can solve it, or is it possible generate safe configurations?

barrydoooit · 2022-10-03T05:42:33Z

You can use this function for planning: https://github.com/msr-fiddle/pipedream/blob/pipedream_2bw/planner/planner.py#L33.

For the performance and memory cost functions, you might want to use direct measurements (from running a 100 or so iterations for the respective configuration).

May I know if this is still available for non-commercial usage now?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planner for PipeDream-2BW #57

Planner for PipeDream-2BW #57

nict-wisdom commented Sep 23, 2020

deepakn94 commented Sep 23, 2020

nict-wisdom commented Sep 23, 2020

deepakn94 commented Sep 23, 2020

nict-wisdom commented Sep 23, 2020

barrydoooit commented Oct 3, 2022

Planner for PipeDream-2BW #57

Planner for PipeDream-2BW #57

Comments

nict-wisdom commented Sep 23, 2020

deepakn94 commented Sep 23, 2020

nict-wisdom commented Sep 23, 2020

deepakn94 commented Sep 23, 2020

nict-wisdom commented Sep 23, 2020

barrydoooit commented Oct 3, 2022