You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have a question regarding the handling of dimensions after Inter-Pooling. Specifically, the original batch size B becomes Bf^2 after the pooling operations, meaning each image's batch size is now f^2. The paper does not seem to explain in detail how this altered batch dimension is handled in subsequent operations,e.g. how batchsize is converted from Bf^2 to B while the spatial dimension is downsampled.
The text was updated successfully, but these errors were encountered:
Hi Wang,
this Zander, thanks for your insightful comment and attention about our work.
For Q, K and V generated by the raw Q K V, we adopt two different strategiest to generate each of them .
First, for Q, we adopt the rerange operation to change the dimension of Q, e.g., [B, C, H ,W] -> [B f^2, C, H/f, W/f], to reserve the information without lossing
Second, for K and V, after downsampling the spatial dimension by convolution~(e.g. [B, C, H, W] -> [B, C, H/f, W/f]), we adopt the repeat operation to pull the dimension of K and V, e.g. [B, C, H/f, W/f] -> [B f^2, C, H/f, W/f], to increase the information dimension different from Q.
I hope my answer can alleviate your confusion, if you also question, please let me know. THX
Hello, I have a question regarding the handling of dimensions after Inter-Pooling. Specifically, the original batch size B becomes Bf^2 after the pooling operations, meaning each image's batch size is now f^2. The paper does not seem to explain in detail how this altered batch dimension is handled in subsequent operations,e.g. how batchsize is converted from Bf^2 to B while the spatial dimension is downsampled.
The text was updated successfully, but these errors were encountered: