Fix `ChannelShuffle` #418

innat · 2022-05-07T22:21:24Z

Close #259

bhack · 2022-05-08T07:53:45Z

Was this done for shufflenet?

innat · 2022-05-08T09:28:09Z

I didn't test with the actual ShuffleNet model but I tested with this model, works fine so far. So, I think it should also work in general.

bhack · 2022-05-08T10:25:31Z

What I meant is that shufflenet has not this randomizzation requirements so probably we need to do something different.

I don't think that we care about gradients in KLPs cause we have already many other ops without gradient in KLPs so I don't think we are tracking this.

innat · 2022-05-08T11:03:40Z

What I meant is that shufflenet has not this randomizzation requirements so probably we need to do something different.

Like this?

tf.random.set_seed(self.seed)
tf.random.uniform(shape=[self.group], seed=self.seed)

bhack · 2022-05-08T12:15:14Z

Yes like:

https://github.com/tensorpack/tensorpack/blob/master/examples/ImageNetModels/shufflenet.py#L40-L49

innat · 2022-05-08T12:39:14Z

Yes. I found that earlier but couldn't adopt it because initially this layer was proposed as an input channel shuffle and randomness was a must.

However, now I think we can use tf.random.set_seed to make it static or a one-time change for each channel layer's instance.

bhack · 2022-05-08T12:44:19Z

Let's see if we could unify the two use cases.

But It isn't clear to me if we want to reuse the KLP namespace in differentiable layers...

keras_cv/layers/preprocessing/channel_shuffle.py

bhack · 2022-05-10T11:29:49Z

I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.

Can we maintain the original impl for KLP?

LukeWood · 2022-05-10T21:06:02Z

Let's see if we could unify the two use cases.

But It isn't clear to me if we want to reuse the KLP namespace in differentiable layers...

I think it is fine. Layers are imported from keras_cv.layers.*, where the code lives is less important. ShuffleNet is a rare case.

LukeWood · 2022-05-10T23:48:03Z

I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.

Can we maintain the original impl for KLP?

thanks for the note: can we clarify this point? How does the behavior change? This may be going over my head here

bhack · 2022-05-11T00:02:21Z

I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.

Can we maintain the original impl for KLP?

thanks for the note: can we clarify this point? How does the behavior change? This may be going over my head here

What we are trying to fix with this PR? The missing gradient? avoid vmap fallback? XLA coverage? Something else?

LukeWood · 2022-05-11T01:09:44Z

I think that we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.
Can we maintain the original impl for KLP?

thanks for the note: can we clarify this point? How does the behavior change? This may be going over my head here

What we are trying to fix with this PR? The missing gradient? avoid vmap fallback? XLA coverage? Something else?

I imagine the missing gradient but that is a separate discussion imo.

What behavior mismatches here? Functionally the layer does the same thing, right? Or do I misunderstand

innat · 2022-05-15T18:10:32Z

@bhack

I think we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.

The operation will have a subtle difference if we want to maintain two separate implementations and the naming perhaps.

Also, currently, it uses two times transpose operation. But I think the new PR is more light. What do you think?

innat · 2022-05-15T18:11:32Z

What I meant is that shufflenet has not this randomizzation requirements so probably we need to do something different.

Like this?
tf.random.set_seed(self.seed)
tf.random.uniform(shape=[self.group], seed=self.seed)

@LukeWood
Is it ok to use the above way?

LukeWood · 2022-05-15T18:18:50Z

@bhack

I think we could separate Shuffle in Shufflenet and KLP layers as they have two very different behaviors.

The operation will have a subtle difference if we want to maintain two separate implementations and the naming perhaps.

Also, currently, it uses two times transpose operation. But I think the new PR is more light. What do you think?

Can we clarify what the differences are? I am not convinced we need 2 implementations unless the differences are significant

bhack · 2022-05-15T18:42:10Z

Can we clarify what the differences are?
Two points:

I've linked above a quite popular TF implementation at that time and If I remember correctly from the paper (but I need to check it again) channel_shuffle has no random concept in ShuffleNet.
Correct me if I am wrong. By an API point of view as our KLP layers are preprocessing layers they are not guaranteed do be differentiable. So it could be a little bit strange to use a layer from a KLP namespace in a "learnable" section of the network graph.

innat · 2022-05-15T21:53:51Z

@bhack

Correct me if I am wrong. By an API point of view as our KLP layers are preprocessing layers they are not guaranteed do be differentiable. So it could be a little bit strange to use a layer from a KLP namespace in a "learnable" section of the network graph.

I think it's fine to consider it a special case. The static channel shuffle operation is offered in pytorch officially but not as a random layer for image augmentation. But it does on the albumentaiton library. Either it treats as a differentiable layer or processing layer, in both case it has to subclass the layer module. HERE was a relevant query. Also, I think the KPL layers may contain differentiable layers in the future.

bhack · 2022-05-15T23:41:14Z

I think the KPL layers may contain differentiable layers in the future.

Isn't this point partially the root cause of this PR? If we are going in this direction we need to care every time about using or not differentiable ops when we contribute a new component in the preprocessing folder.

Your detection emerged with a sort of "e2e integration test" #218 (comment)):

we currently don't have this integration tests in the repo
it emerged as we had a minor bug in TF for that specific op fixed by Register Not differentiable RandomShuffle tensorflow/tensorflow#55529.

LukeWood · 2022-05-16T03:51:46Z

I think the KPL layers may contain differentiable layers in the future.

Isn't this point partially the root cause of this PR? If we are going in this direction we need to care every time about using or not differentiable ops when we contribute a new component in the preprocessing folder.

Your detection emerged with a sort of "e2e integration test" #218 (comment)):

we currently don't have this integration tests in the repo

it emerged as we had a minor bug in TF for that specific op fixed by Register Not differentiable RandomShuffle tensorflow/tensorflow#55529.

From what I can tell I think ShuffleNet is a super rare one off.

bhack · 2022-05-16T08:46:13Z

From what I can tell I think ShuffleNet is a super rare one off.

Also If we consider this a quite rare case (I am not so future proof on this), for the API consistency we need to consider that this approach will require to remove the Random prefix from the name also if It has the (optional) random logic.
For all the other KLPs we used Random prefix name when there was a random logic.
Instead for this one, if we use the dual scope, we need to control its roandomicty by an extra param.
It think we are looping a Little bit of coherence.

I think it's fine to consider it a special case. The static channel shuffle operation is offered in pytorch officially but not as a random layer for image augmentation

About pytorch there is a more general request of a random shuffle ops with axis param:

pytorch/pytorch#71409

innat · 2022-05-16T12:32:58Z

keras_cv/layers/preprocessing/channel_shuffle.py

@@ -52,6 +54,11 @@ def __init__(self, groups=3, seed=None, **kwargs):
        self.groups = groups
        self.seed = seed

+        tf.random.set_seed(self.seed)


@LukeWood cc @bhack

This line is only set to satisfy the shuffle-net's requirements. By this, it will produce a static shuffled operation.

aug_fn = ChannelShuffle(groups=3, seed=101) aug_fn(a, training=True) aug_fn = ChannelShuffle(groups=3, seed=42) aug_fn(a, training=True)

But for random operation, seed=None.
But this may also be a bit in conflict with other KPLs from implementation perspective. Thoughts?

This is the point in my last comment

@LukeWood cc @bhack

This line is only set to satisfy the shuffle-net's requirements. By this, it will produce a static shuffled operation.

aug_fn = ChannelShuffle(groups=3, seed=101) aug_fn(a, training=True) aug_fn = ChannelShuffle(groups=3, seed=42) aug_fn(a, training=True)

But for random operation, seed=None. But this may also be a bit in conflict with other KPLs from implementation perspective. Thoughts?

I don't think we should be modifying seed inside of any layer.

innat · 2022-05-16T12:35:37Z

keras_cv/layers/preprocessing/channel_shuffle.py

-        image = tf.random.shuffle(image, seed=self.seed)
-        image = tf.transpose(image, perm=[1, 2, 3, 0])
+
+        rand_indices = tf.argsort(self.rand_uniform(shape=[self.groups]))


@LukeWood cc. @bhack
I've noticed that, the tf.argsort fallback to while loop here.

Tensor("loop_body/argsort/TopKV2:1", shape=(3,), dtype=int32) WARNING:tensorflow:Using a while_loop for converting TopKV2

@LukeWood cc. @bhack I've noticed that, the tf.argsort fallback to while loop here.

Tensor("loop_body/argsort/TopKV2:1", shape=(3,), dtype=int32) WARNING:tensorflow:Using a while_loop for converting TopKV2

Just another one to add to the list. But as you can see with the master implementation we have already RandomShuffle: #291 (comment)

So one in, one out.

innat · 2022-05-16T12:43:48Z

@bhack cc. @LukeWood

Also If we consider this a quite rare case (I am not so future proof on this), for the API consistency we need to consider that this approach will require removing the Random prefix from the name also if It has the (optional) random logic.

IMO, I think it's not required to add a random prefix if we consider it as a special case. Curious, will keras_cv always provide only the random transformation layer?

I'm ok to have two separate layers if it's accepted to keras_cv. But that will also feel a bit odd and naming might be confusing.

I'm more interested to unify the two cases, i.e. for the random operation as KPL and for static operation as Model layers.

bhack · 2022-05-16T13:03:59Z

Curious, will keras_cv always provide only the random transformation layer?

I think this part of the topic was early covered at #122 (comment). IMHO from that discussion we have not got any particular outcome.

LukeWood · 2022-05-16T18:32:54Z

From what I can tell I think ShuffleNet is a super rare one off.

Also If we consider this a quite rare case (I am not so future proof on this), for the API consistency we need to consider that this approach will require to remove the Random prefix from the name also if It has the (optional) random logic. For all the other KLPs we used Random prefix name when there was a random logic. Instead for this one, if we use the dual scope, we need to control its roandomicty by an extra param. It think we are looping a Little bit of coherence.

I think it's fine to consider it a special case. The static channel shuffle operation is offered in pytorch officially but not as a random layer for image augmentation

About pytorch there is a more general request of a random shuffle ops with axis param:

pytorch/pytorch#71409

Ok, yeah if there is no random behavior in one case and it is fully random in the other it should really be a different layer. I misunderstood the original paper and meaning of ShuffleNet.

bhack · 2022-05-16T18:36:08Z

Ok, yeah if there is no random behavior in one case and it is fully random in the other it should really be a different layer. I misunderstood the original paper and meaning of ShuffleNet.

It was the original point #259 (comment) 13 days ago 😉

tanzhenyu

Thanks for the PR!

tanzhenyu · 2022-11-11T17:12:18Z

keras_cv/layers/preprocessing/channel_shuffle.py

@@ -52,6 +54,11 @@ def __init__(self, groups=3, seed=None, **kwargs):
        self.groups = groups
        self.seed = seed

+        tf.random.set_seed(self.seed)
+        self.rand_uniform = keras_cv.UniformFactorSampler(
+            lower=0, upper=1, seed=self.seed


can you add unit test for this gradient fix?

LukeWood · 2023-04-26T21:54:17Z

Closing for now, as this does not seem to do the same thing as RandomChannelShuffle(). Instead; this appears to shuffle channels uniformly in order to support ShuffleNet. If we ever support ShuffleNet, we can introduce a one-off layer to support that model architecture, in a gradient friendly approach.

Thank you for your enthusiasm as always!

alt shuffle indices

69bac3a

innat marked this pull request as ready for review May 7, 2022 22:22

refactor

980d9bd

LukeWood reviewed May 9, 2022

View reviewed changes

keras_cv/layers/preprocessing/channel_shuffle.py Outdated Show resolved Hide resolved

update

06a146c

innat requested review from LukeWood May 15, 2022 19:29

Merge branch 'keras-team:master' into fix_chl_sfl

c103750

innat commented May 16, 2022

View reviewed changes

LukeWood added the awaiting review PRs that need to be reviewed label Jul 14, 2022

tanzhenyu reviewed Nov 11, 2022

View reviewed changes

LukeWood closed this Apr 26, 2023

freedomtan pushed a commit to freedomtan/keras-cv that referenced this pull request Jul 20, 2023

Fix the get_device() function for the torch backend (keras-team#418)

87b97e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `ChannelShuffle` #418

Fix `ChannelShuffle` #418

innat commented May 7, 2022

bhack commented May 8, 2022

innat commented May 8, 2022

bhack commented May 8, 2022

innat commented May 8, 2022

bhack commented May 8, 2022

innat commented May 8, 2022

bhack commented May 8, 2022 •

edited

Loading

bhack commented May 10, 2022

LukeWood commented May 10, 2022

LukeWood commented May 10, 2022

bhack commented May 11, 2022

LukeWood commented May 11, 2022 •

edited

Loading

innat commented May 15, 2022

innat commented May 15, 2022

LukeWood commented May 15, 2022

bhack commented May 15, 2022

innat commented May 15, 2022

bhack commented May 15, 2022

LukeWood commented May 16, 2022

bhack commented May 16, 2022

innat May 16, 2022

bhack May 16, 2022

LukeWood May 16, 2022

innat May 16, 2022

bhack May 16, 2022 •

edited

Loading

innat commented May 16, 2022

bhack commented May 16, 2022

LukeWood commented May 16, 2022

bhack commented May 16, 2022

tanzhenyu left a comment

tanzhenyu Nov 11, 2022

LukeWood commented Apr 26, 2023

Fix ChannelShuffle #418

Fix ChannelShuffle #418

Conversation

innat commented May 7, 2022

bhack commented May 8, 2022

innat commented May 8, 2022

bhack commented May 8, 2022

innat commented May 8, 2022

bhack commented May 8, 2022

innat commented May 8, 2022

bhack commented May 8, 2022 • edited Loading

bhack commented May 10, 2022

LukeWood commented May 10, 2022

LukeWood commented May 10, 2022

bhack commented May 11, 2022

LukeWood commented May 11, 2022 • edited Loading

innat commented May 15, 2022

innat commented May 15, 2022

LukeWood commented May 15, 2022

bhack commented May 15, 2022

innat commented May 15, 2022

bhack commented May 15, 2022

LukeWood commented May 16, 2022

bhack commented May 16, 2022

innat May 16, 2022

Choose a reason for hiding this comment

bhack May 16, 2022

Choose a reason for hiding this comment

LukeWood May 16, 2022

Choose a reason for hiding this comment

innat May 16, 2022

Choose a reason for hiding this comment

bhack May 16, 2022 • edited Loading

Choose a reason for hiding this comment

innat commented May 16, 2022

bhack commented May 16, 2022

LukeWood commented May 16, 2022

bhack commented May 16, 2022

tanzhenyu left a comment

Choose a reason for hiding this comment

tanzhenyu Nov 11, 2022

Choose a reason for hiding this comment

LukeWood commented Apr 26, 2023

Fix `ChannelShuffle` #418

Fix `ChannelShuffle` #418

bhack commented May 8, 2022 •

edited

Loading

LukeWood commented May 11, 2022 •

edited

Loading

bhack May 16, 2022 •

edited

Loading