-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ChannelShuffle
#418
Fix ChannelShuffle
#418
Conversation
Was this done for |
I didn't test with the actual ShuffleNet model but I tested with this model, works fine so far. So, I think it should also work in general. |
What I meant is that shufflenet has not this randomizzation requirements so probably we need to do something different. I don't think that we care about gradients in KLPs cause we have already many other ops without gradient in KLPs so I don't think we are tracking this. |
Like this? tf.random.set_seed(self.seed)
tf.random.uniform(shape=[self.group], seed=self.seed) |
Yes. I found that earlier but couldn't adopt it because initially this layer was proposed as an input channel shuffle and randomness was a must. However, now I think we can use |
Let's see if we could unify the two use cases. But It isn't clear to me if we want to reuse the KLP namespace in differentiable layers... |
I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors. Can we maintain the original impl for KLP? |
I think it is fine. Layers are imported from |
thanks for the note: can we clarify this point? How does the behavior change? This may be going over my head here |
What we are trying to fix with this PR? The missing gradient? avoid vmap fallback? XLA coverage? Something else? |
I imagine the missing gradient but that is a separate discussion imo. What behavior mismatches here? Functionally the layer does the same thing, right? Or do I misunderstand |
The operation will have a subtle difference if we want to maintain two separate implementations and the naming perhaps. Also, currently, it uses two times transpose operation. But I think the new PR is more light. What do you think? |
@LukeWood |
Can we clarify what the differences are? I am not convinced we need 2 implementations unless the differences are significant |
|
I think it's fine to consider it a special case. The static channel shuffle operation is offered in pytorch officially but not as a random layer for image augmentation. But it does on the albumentaiton library. Either it treats as a differentiable layer or processing layer, in both case it has to subclass the |
Isn't this point partially the root cause of this PR? If we are going in this direction we need to care every time about using or not differentiable ops when we contribute a new component in the Your detection emerged with a sort of "e2e integration test" #218 (comment)):
|
From what I can tell I think ShuffleNet is a super rare one off. |
Also If we consider this a quite rare case (I am not so future proof on this), for the API consistency we need to consider that this approach will require to remove the Random prefix from the name also if It has the (optional) random logic.
About pytorch there is a more general request of a random shuffle ops with axis param: |
@@ -52,6 +54,11 @@ def __init__(self, groups=3, seed=None, **kwargs): | |||
self.groups = groups | |||
self.seed = seed | |||
|
|||
tf.random.set_seed(self.seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is only set to satisfy the shuffle-net's requirements. By this, it will produce a static shuffled operation.
aug_fn = ChannelShuffle(groups=3, seed=101)
aug_fn(a, training=True)
aug_fn = ChannelShuffle(groups=3, seed=42)
aug_fn(a, training=True)
But for random operation, seed=None
.
But this may also be a bit in conflict with other KPLs from implementation perspective. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the point in my last comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is only set to satisfy the shuffle-net's requirements. By this, it will produce a static shuffled operation.
aug_fn = ChannelShuffle(groups=3, seed=101) aug_fn(a, training=True) aug_fn = ChannelShuffle(groups=3, seed=42) aug_fn(a, training=True)But for random operation,
seed=None
. But this may also be a bit in conflict with other KPLs from implementation perspective. Thoughts?
I don't think we should be modifying seed inside of any layer.
image = tf.random.shuffle(image, seed=self.seed) | ||
image = tf.transpose(image, perm=[1, 2, 3, 0]) | ||
|
||
rand_indices = tf.argsort(self.rand_uniform(shape=[self.groups])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LukeWood cc. @bhack I've noticed that, the
tf.argsort
fallback to while loop here.Tensor("loop_body/argsort/TopKV2:1", shape=(3,), dtype=int32) WARNING:tensorflow:Using a while_loop for converting TopKV2
Just another one to add to the list. But as you can see with the master implementation we have already RandomShuffle
: #291 (comment)
So one in, one out.
IMO, I think it's not required to add a I'm ok to have two separate layers if it's accepted to I'm more interested to unify the two cases, i.e. for the random operation as KPL and for static operation as Model layers. |
I think this part of the topic was early covered at #122 (comment). IMHO from that discussion we have not got any particular outcome. |
Ok, yeah if there is no random behavior in one case and it is fully random in the other it should really be a different layer. I misunderstood the original paper and meaning of ShuffleNet. |
It was the original point #259 (comment) 13 days ago 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
@@ -52,6 +54,11 @@ def __init__(self, groups=3, seed=None, **kwargs): | |||
self.groups = groups | |||
self.seed = seed | |||
|
|||
tf.random.set_seed(self.seed) | |||
self.rand_uniform = keras_cv.UniformFactorSampler( | |||
lower=0, upper=1, seed=self.seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add unit test for this gradient fix?
Closing for now, as this does not seem to do the same thing as RandomChannelShuffle(). Instead; this appears to shuffle channels uniformly in order to support ShuffleNet. If we ever support ShuffleNet, we can introduce a one-off layer to support that model architecture, in a gradient friendly approach. Thank you for your enthusiasm as always! |
Close #259