Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Butterfly Mixing Aggregator #72

Open
Kelvin-Ng opened this issue Oct 14, 2016 · 19 comments
Open

Butterfly Mixing Aggregator #72

Kelvin-Ng opened this issue Oct 14, 2016 · 19 comments
Assignees

Comments

@Kelvin-Ng
Copy link
Contributor

Kelvin-Ng commented Oct 14, 2016

The aggregator currently included in Husky is an AllReduce style aggregator. I think we can also include a Butterfly Mixing aggregator. It is a general framework that can be used to approximately implement distributed machine learning algorithms efficiently, but not limited to a particular type of algorithm.

I have two ideas on the API, the main difference is on update() function. One needs the user to put the iteration counter in the parameter of update(), then the system will calculate which worker to pair with. Another one is to let the aggregator maintain the counter. I personally like the previous one because it is cleaner and more transperent to users.

@kygx-legend
Copy link
Member

To inherit the current Aggregator?

@Kelvin-Ng
Copy link
Contributor Author

No. The current Aggregator should inherit the Butterfly Mixing aggregator, because the current aggregator is only a special case of the Butterfly Mixing aggregator.

@kygx-legend
Copy link
Member

OK. Could you make the implementation?

@Kelvin-Ng
Copy link
Contributor Author

I am busy...

By the way, I have told @zzxx-husky that the implementation can be greatly simplified by using ObjList and CombinedPushChannel directly. I think it is a good opportunity to implement the Butterfly Mixing aggregator in this way. Then, we can throw away the current aggregator and make a new AllReduce aggregator by using the Butterfly Mixing aggregator.

@zzxx-husky
Copy link
Collaborator

zzxx-husky commented Oct 14, 2016

@Kelvin-Ng Actually based on my understanding, we can reimplemente aggregator using ObjList and specific push (the one we can specify dest worker). But the implemetation may not be simplified easily, you still need to design a base class to store aggregators in a unified way, think about how to do the global aggregation in a decentralized way, etc. And because I cannot simply implement aggregator using existing logics in husky, I think it's better for me to make aggregator independent so that it's easier to write unittests. And also it's easier to adapt if the inner logic of husky changes in the future.

P.S. Maybe I've done too much work on wrapping aggregator to make its APIs look nicer.

@Kelvin-Ng
Copy link
Contributor Author

@zzxx-husky Maybe I can only get the feeling about the complexity after really working on it...

@zzxx-husky
Copy link
Collaborator

Just check the APIs of mailbox, I think it's possible to implement butterfly mixing on Github husky. But it's still synchronous. Is it the same as what you think?

@Kelvin-Ng
Copy link
Contributor Author

Yes. Butterfly Mixing is synchronous.

@Kelvin-Ng
Copy link
Contributor Author

No, I want to understand what you mean precisely by 'synchronous'. In Butterfly Mixing, the paired workers will synchronize synchronously, but the workers will not wait for the workers that are not paired.

@zzxx-husky
Copy link
Collaborator

Maybe we share the same understanding. BM needs log(W) iterations (W is the no. of workers) and for each iterations, each worker just needs to wait for one another worker. To perform a global aggregation, one worker needs to wait for log(M) workers totally. Lunch time ~

P.S. maybe better to think in terms of machine instead of worker?

@Kelvin-Ng
Copy link
Contributor Author

Yes.

Actually I am also thinking about the problem about threads and machines. The paper of BM does not talk in terms of threads, but nodes. I think in Husky, we can do a full aggregation within a machine and then aggregate with one another machine, although I am not sure if this is precisely what being mentioned in the paper. But anyway, I am nearly sure that both can converge.

@Kelvin-Ng
Copy link
Contributor Author

It seems that Butterfly Mixing cannot be done with ObjList + PushCombinedChannel.

Also, it would be good if the Butterfly Mixing aggregator can let the users specify how many machines to aggregate with in each iteration, instead of a fixed number, 2.

@ddmbr
Copy link
Member

ddmbr commented Oct 15, 2016

@Kelvin-Ng What you want will be more feasible after issue #76 is resolved. You may check my PR.

@zzxx-husky
Copy link
Collaborator

My understanding is the whole Butterfly Mixing (which contains log(W) iterations) can actually be done in one round of network communication in husky, based on the existing APIs of mailbox

@Kelvin-Ng
Copy link
Contributor Author

You mean making a global aggregation after log(W) iteration, and make no aggregations in between? This is not Butterfly Mixing.

@zzxx-husky
Copy link
Collaborator

Nope. I mean I can do async message passing using the APIs of mailbox (once I flush the messages, they're really sent out), so each worker may need to wait for log(W) messages from others, but it does NOT mean I need to have log(W) rounds of network comm (each round of network comm is a global barrier). Instead, this can be done in only one round of network comm.

@Kelvin-Ng
Copy link
Contributor Author

Don't understand what you mean...

@Yuzhen11
Copy link
Collaborator

If the message between one thread and another thread is just a binstream each time, then we don't need extra support by mailbox to accomplish this, just wait for log(w) binstream received. Is that what you mean? @zzxx-husky Then is this case, we can also use this counting method to do BSP without send_complete? The functionality of barrier is pushed up to the channel layer from the mailbox layer?

@zzxx-husky
Copy link
Collaborator

Lucien implemented a simple version of BM. As I tested using a9 on w30, the CPU is fully used. Wrong rate is about 14%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants