-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Butterfly Mixing Aggregator #72
Comments
To inherit the current Aggregator? |
No. The current Aggregator should inherit the Butterfly Mixing aggregator, because the current aggregator is only a special case of the Butterfly Mixing aggregator. |
OK. Could you make the implementation? |
I am busy... By the way, I have told @zzxx-husky that the implementation can be greatly simplified by using ObjList and CombinedPushChannel directly. I think it is a good opportunity to implement the Butterfly Mixing aggregator in this way. Then, we can throw away the current aggregator and make a new AllReduce aggregator by using the Butterfly Mixing aggregator. |
@Kelvin-Ng Actually based on my understanding, we can reimplemente aggregator using ObjList and specific P.S. Maybe I've done too much work on wrapping aggregator to make its APIs look nicer. |
@zzxx-husky Maybe I can only get the feeling about the complexity after really working on it... |
Just check the APIs of mailbox, I think it's possible to implement butterfly mixing on Github husky. But it's still synchronous. Is it the same as what you think? |
Yes. Butterfly Mixing is synchronous. |
No, I want to understand what you mean precisely by 'synchronous'. In Butterfly Mixing, the paired workers will synchronize synchronously, but the workers will not wait for the workers that are not paired. |
Maybe we share the same understanding. BM needs P.S. maybe better to think in terms of machine instead of worker? |
Yes. Actually I am also thinking about the problem about threads and machines. The paper of BM does not talk in terms of threads, but nodes. I think in Husky, we can do a full aggregation within a machine and then aggregate with one another machine, although I am not sure if this is precisely what being mentioned in the paper. But anyway, I am nearly sure that both can converge. |
It seems that Butterfly Mixing cannot be done with ObjList + PushCombinedChannel. Also, it would be good if the Butterfly Mixing aggregator can let the users specify how many machines to aggregate with in each iteration, instead of a fixed number, 2. |
@Kelvin-Ng What you want will be more feasible after issue #76 is resolved. You may check my PR. |
My understanding is the whole Butterfly Mixing (which contains log(W) iterations) can actually be done in one round of network communication in husky, based on the existing APIs of mailbox |
You mean making a global aggregation after log(W) iteration, and make no aggregations in between? This is not Butterfly Mixing. |
Nope. I mean I can do async message passing using the APIs of mailbox (once I flush the messages, they're really sent out), so each worker may need to wait for log(W) messages from others, but it does NOT mean I need to have log(W) rounds of network comm (each round of network comm is a global barrier). Instead, this can be done in only one round of network comm. |
Don't understand what you mean... |
If the message between one thread and another thread is just a binstream each time, then we don't need extra support by mailbox to accomplish this, just wait for log(w) binstream received. Is that what you mean? @zzxx-husky Then is this case, we can also use this counting method to do BSP without send_complete? The functionality of barrier is pushed up to the channel layer from the mailbox layer? |
Lucien implemented a simple version of BM. As I tested using |
The aggregator currently included in Husky is an AllReduce style aggregator. I think we can also include a Butterfly Mixing aggregator. It is a general framework that can be used to approximately implement distributed machine learning algorithms efficiently, but not limited to a particular type of algorithm.
I have two ideas on the API, the main difference is on
update()
function. One needs the user to put the iteration counter in the parameter ofupdate()
, then the system will calculate which worker to pair with. Another one is to let the aggregator maintain the counter. I personally like the previous one because it is cleaner and more transperent to users.The text was updated successfully, but these errors were encountered: