Notebook for attention map over input image #306

legel · 2023-11-12T05:24:03Z

Attention heatmap visualization is a common utility that will likely serve several researchers.

In order to implement it, it requires some subtle code changes to fundamental classes that many researchers might wish to have already implemented for convenience.

Inspired by a working implementation from here, I also took further steps of figuring out how to load pre-trained models with registers ("Vision Transformers Need Registers"), which indeed resolves curious artifacts with some background attention tokens.

I've also cleaned up code substantially, provided a simple example on a cool NASA space shuttle launch from Wikimedia Commons, and introduced a nice subtle visualization of the attention mask directly on top of the original image.

I hope this helps several researchers and developers!

This pull request addresses or resolves the following:

P.S. I haven't made many pull requests, and didn't want to mix up with #305, so I forked two different repositories, but in the future will just create branches for pull requests. Thanks!

LichunZhang · 2023-11-25T11:02:45Z

Hi Iegel,

I find an error happens at attention.ipynb:
In Cell [8]:
attentions = attentions[0, :, 0, 1+n_register_tokens:].reshape(number_of_head, -1)
IndexError: too many indices for tensor of dimension 3

So I checked the attention shape. It turns out to be [1, 4629, 768] instead of [1, 12, 4629, 4629] in your notebook. I know the 768 is the embedding dimension of base model. Why my attention results have a different dimension from yours?
Thank you.

legel · 2023-11-26T03:29:37Z

Hi @LichunZhang my best guess is that one of your core files did not get changed properly, so the model is still only feed-forwarding the 768 dimensional features, instead of the full attention...

I would double check that you've cloned the repository directly from https://github.com/3cology/dinov2_with_attention_extraction/tree/main and then run the notebook in that repository. Feel free to share the output here and any further insights.

LichunZhang · 2023-11-26T23:02:28Z

Hi @LichunZhang my best guess is that one of your core files did not get changed properly, so the model is still only feed-forwarding the 768 dimensional features, instead of the full attention...

I would double check that you've cloned the repository directly from https://github.com/3cology/dinov2_with_attention_extraction/tree/main and then run the notebook in that repository. Feel free to share the output here and any further insights.

Thank you for the quick response.
I do clone the whole repository directly from https://github.com/3cology/dinov2_with_attention_extraction/tree/main and the attention.ipynb still returns [1, 4629, 768] attentions. Could you please check the issue? Maybe some files in the main branch changed after your fork?

riccardorenzulli · 2023-11-30T08:31:30Z

Hi @LichunZhang my best guess is that one of your core files did not get changed properly, so the model is still only feed-forwarding the 768 dimensional features, instead of the full attention...
I would double check that you've cloned the repository directly from https://github.com/3cology/dinov2_with_attention_extraction/tree/main and then run the notebook in that repository. Feel free to share the output here and any further insights.

Thank you for the quick response. I do clone the whole repository directly from https://github.com/3cology/dinov2_with_attention_extraction/tree/main and the attention.ipynb still returns [1, 4629, 768] attentions. Could you please check the issue? Maybe some files in the main branch changed after your fork?

I think it happens because you are using the xFormers library, which uses MemEffAttention by default (https://github.com/3cology/dinov2_with_attention_extraction/blob/main/dinov2/layers/attention.py#L77) instead of standard Attention (https://github.com/3cology/dinov2_with_attention_extraction/blob/main/dinov2/layers/attention.py#L36). Note that MemEffAttention does not return the attention matrix, as Attention module does, but only the new representation for x.

Since MemEffAttention module does not store the attention matrix at all (see facebookresearch/xformers#730 (comment)) , you need to use the Attention module to plot the saliency map.

It should work if at the beginning of the notebook you set something like os.environ["XFORMERS_DISABLED"] = '0'

alexaatm · 2023-12-04T17:21:38Z

Hi!
I stumbled on the same issue - the output of my attentions is a tensor with 3 dimensions: attentions.shape=torch.Size([1, 329, 384]).

Checked that this happens with both xformers and without (export XFORMERS_DISABLED=True).

alexaatm · 2023-12-04T18:09:59Z

Update: confirmed that it happens because of xformers enabled. Before I must haved overlooked it..
Solved now:)

LichunZhang · 2023-12-04T20:44:01Z

I solved the issue now. Refer to #90 and find ludles's answer. It turns out that we should modify the code of MemEffAttention .

XiphosF · 2024-10-23T09:09:56Z

Hi, I tested your notebook for visualizing attention maps with DinoV2, but I'm seeing strange results between the base and large models. The base model gives the expected results, but when I switch to the large model, the attention maps are noticeably worse. It also seems like the issue that should've been "fixed" with the register addition is still present? Do you have any idea where this might come from ?

legel · 2024-10-23T10:35:59Z

Hi @XiphosF I do not know the answer, but indeed it seems like something with the dimensionality or loading of weights from the larger model with registers isn't working properly for you.

Curiously, it looks like your large model attention map has all of the attention concentrated on one of the effective "spatial attention pixels". You might try looking at the paper for training with registers, because that does seem familiar.

Anyways, as a hack, you might try clipping to a max value well below that, and then renormalizing the spatial distributions.

XiphosF · 2024-10-24T15:42:40Z

Hi, thanks for your answer ! Even though the clip then normalize might be a quick patch, i thought that the attention outliers in the larger models of dinov2 were supposed to be completely fixed with the addition of registers as claimed in the paper ? As anyone experienced this still with the registers version ? I don't think that I'm doing something wrong with the weights loading since i just change the path of the weights (to dinov2_vitl14_reg4_pretrain.pth) and instantiate a vit_large instead of vit_base (i've no problem too with dinov2s with reg).

huynguyentran · 2024-10-28T04:12:11Z

Quick question regarding the register tokens. I understand that register tokens are made to negate the issue with artifacts per "Vision Transformers Need Registers", but I thought their use was to capture global information. However, when I tried the following code to see what registers hold, it seemed that they were still attending to local information. Did I set it up wrong or is this behavior expected?

for i in range(n_register_tokens):
register_token_index = i + 1
register_attention = attention[0, :, register_token_index, 1 + n_register_tokens:].reshape(number_of_heads, -1)

XiphosF · 2024-10-28T08:37:22Z

Hi, your code looks correct to me. Yes, their hypothesis was that the registers hold global information. However, based on section 3.4 of the paper, i would say that the information type in these registers can vary depending on the image?

huynguyentran · 2024-10-29T02:30:26Z

Unless I have been very unlucky with my images, or it seems the "base" model registers do not contain any global information at all. Anyone have an image example that can show artifact behavior in the registers' attention for base model?

userzhi · 2024-10-31T13:53:54Z

@legel ,hello, i am facing a question as following
"forward() got an unexpected keyword argument 'return_attention"
but i do not know why this happen, can you give me some idea, thank you

legel · 2024-10-31T16:12:19Z

@userzhi I think you'll need to make sure you have the latest changes of the files in this pull request, e.g. here

dinov2/dinov2/layers/attention.py

Line 56 in df7265c

def forward(self, x: Tensor, return_attn=False) -> Tensor:

Notebook for attention map over input image

df7265c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 12, 2023

riccardorenzulli mentioned this pull request Feb 9, 2024

Cannot identify high-norm tokens #373

Open

SydCS mentioned this pull request Mar 8, 2024

registers attention map #390

Closed

K-bNd mentioned this pull request Jul 1, 2024

Support for last layer self-attention visualisation on ViT huggingface/pytorch-image-models#2220

Closed

XiphosF mentioned this pull request Oct 25, 2024

Attention maps exhibit outliers even with registers #475

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook for attention map over input image #306

Notebook for attention map over input image #306

legel commented Nov 12, 2023

LichunZhang commented Nov 25, 2023

legel commented Nov 26, 2023

LichunZhang commented Nov 26, 2023

riccardorenzulli commented Nov 30, 2023

alexaatm commented Dec 4, 2023

alexaatm commented Dec 4, 2023

LichunZhang commented Dec 4, 2023

XiphosF commented Oct 23, 2024

legel commented Oct 23, 2024

XiphosF commented Oct 24, 2024 •

edited

Loading

huynguyentran commented Oct 28, 2024

XiphosF commented Oct 28, 2024

huynguyentran commented Oct 29, 2024

userzhi commented Oct 31, 2024

legel commented Oct 31, 2024

Notebook for attention map over input image #306

Are you sure you want to change the base?

Notebook for attention map over input image #306

Conversation

legel commented Nov 12, 2023

LichunZhang commented Nov 25, 2023

legel commented Nov 26, 2023

LichunZhang commented Nov 26, 2023

riccardorenzulli commented Nov 30, 2023

alexaatm commented Dec 4, 2023

alexaatm commented Dec 4, 2023

LichunZhang commented Dec 4, 2023

XiphosF commented Oct 23, 2024

legel commented Oct 23, 2024

XiphosF commented Oct 24, 2024 • edited Loading

huynguyentran commented Oct 28, 2024

XiphosF commented Oct 28, 2024

huynguyentran commented Oct 29, 2024

userzhi commented Oct 31, 2024

legel commented Oct 31, 2024

XiphosF commented Oct 24, 2024 •

edited

Loading