Notebook training classifier on custom dataset #305

legel · 2023-11-11T00:45:30Z

It was difficult and time-consuming for me to get a "hello world" training of a DinoV2 classifier on a custom dataset.

Existing options involve diving deep into complex classes and APIs that appear to be designed especially for ImageNet.

In any case, a simple starter notebook is likely to prove useful for many others.

Inspired by successful working code and a tutorial from here, I developed and tested code for downloading a DinoV2 model and training classifier layers from scratch on a custom dataset. This is contained in a classification.ipynb notebook.

There are several github issues that this notebook will serve, which readers / authors from there may wish to be aware of:

tungts1101 · 2023-11-15T07:49:44Z

When I used only the backbone torch.hub.load("facebookresearch/dinov2", 'dinov2_vits14') and my own classification head torch.nn.Linear(in_features=1920, out_features=1000, bias=True), I got an error RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x384 and 1920x1000), which I don't understand since it shares the same architecture with the classification model on the docs torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14_lc'). Do you have any idea what could go wrong here?

legel · 2023-11-15T21:34:23Z

@tungts1101 if you want to use your own head (versus what's defined at https://github.com/legel/dinov2/blob/main/notebooks/classification.ipynb) then note that your mismatch is between the dimensionality of the output of backbone (384) and the input dimensionality of your head (1920). If you change in_features=384 that would probably resolve your problem. Although, note, it wasn't until I used the sequence of layers defined in that notebook that the classifier worked properly for me...

legel · 2023-11-15T21:36:01Z

PS I've upgraded my classification.ipynb script to now work for multi-class problems (rather than just binary), also have added in some better automation for automatic downloads of the pre-trained .pth files, which I'll upload shortly.

tungts1101 · 2023-11-16T07:00:35Z

@legel Doing what you said will work. The part that I do not understand is the linear head of the classification model has input features of 1920 and output features of 1000, yet it still can work with the backbone.

legel · 2023-11-19T21:43:30Z

@tungts1101 I'm not sure what linear head you're using...

I have great news though. There was a pretty significant "error" in my approach to classification, where basically I was retraining all of the DinoV2 weights, instead of just the head.

On the surface this may not seem to be a big deal. But it's certainly dramatically more expensive to compute gradients for all of the original weights, versus just the head -- in terms of compute and memory. And this ends up preventing increasing the batch size -- which can greatly help performance -- among other major benefits.

One of the main goals of the Meta DinoV2 was to contribute a "foundational model" for vision, where you don't have to retrain the core (very much inspired by what OpenAI's ChatGPT did for NLP).

I'm now suddenly seeing fantastic evidence of this for the first time. I'm training a plant species classifier on a dataset close in size and complexity to iNaturalist 2021. And wow! Versus "fine-tuning" the weights of the DinoV2 vision transformer, I'm seeing convergence time increase by at least a factor of 1,000x!

I'll share several large classifier notebook updates ASAP, meanwhile I'm delighted to share this news, and key insight for making the most of the DinoV2 pre-trained weights (representing information from 142 million images).

hoominchu · 2024-02-22T00:53:02Z

Hey @legel! I was looking for something to do exactly this. Did you update the notebook with multi-label classification? I don't see a commit later than your comment here. Would love to get hold of it though. Thanks!

yhsmiley · 2024-04-09T06:08:16Z

@legel I'm also interested in the updated notebook! Will you be updating the code here too? Thanks!

legel · 2024-04-19T14:46:28Z

@yhsmiley @hoominchu I've pushed a notebook that demonstrates multi-class classification with DinoV2. I didn't have time to clean it up, it is very raw, but the core techniques work well.

I should've shared the one two lines of code for the key innovation, that ensures DinoV2 pre-training is fully utilized, i.e. we don't train transformer weights:

for param in model.transformer.parameters():
    param.requires_grad = False

So, the key is to only train the classifier weights.

BOX-LEO · 2024-04-29T03:17:20Z

Does anyone know why the input feature dimension for the linear head, as used by DINO with or without registration, is 1920 instead of 384, which corresponds to the output dimension of the backbone?
Are features from multiple layers concatenated together?

legel · 2024-04-29T15:59:31Z

@BOX-LEO I'm not sure what model specifically you're referring to or what the architecture is.

Could you print out

model.eval()

and share here?

OrangeNo42 · 2024-08-02T09:56:29Z

@legel Hi, why you set img_size=526? I am confused about the img_size, because the image_dimension is resized to 256, and in the official code, the image will be CenterCrop to 224.

joker-bian · 2024-10-26T09:50:51Z

@legel hi,I would like to ask if I want to train this model on image data other than three channels, do I need to train it from the beginning? What should I do specifically?

larosi · 2024-10-27T14:46:02Z

to ask if I want to train this model on image data other than three channels, do I need to train it from the beginning? What should I do specifically?

@joker-bian I recommend you to adapt your images to 3 channels. For example, medical images mainly use a single channel and a typical approach is repeating the image three times. Another idea is train a shallow autoencoder to map N ch to 3 Ch, and preprocess your dataset with that as previous step before train your classifier.

Notebook training classifier on custom dataset

332d37d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 11, 2023

legel mentioned this pull request Nov 12, 2023

Notebook for attention map over input image #306

Open

legel force-pushed the main branch 2 times, most recently from abd14c2 to 332d37d Compare November 12, 2023 07:21

classification with registers

d77bc74

multi-class classification with DinoV2

b8d0c3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook training classifier on custom dataset #305

Notebook training classifier on custom dataset #305

legel commented Nov 11, 2023

tungts1101 commented Nov 15, 2023

legel commented Nov 15, 2023

legel commented Nov 15, 2023

tungts1101 commented Nov 16, 2023

legel commented Nov 19, 2023

hoominchu commented Feb 22, 2024

yhsmiley commented Apr 9, 2024

legel commented Apr 19, 2024

BOX-LEO commented Apr 29, 2024 •

edited

Loading

legel commented Apr 29, 2024

OrangeNo42 commented Aug 2, 2024

joker-bian commented Oct 26, 2024 •

edited

Loading

larosi commented Oct 27, 2024

Notebook training classifier on custom dataset #305

Are you sure you want to change the base?

Notebook training classifier on custom dataset #305

Conversation

legel commented Nov 11, 2023

tungts1101 commented Nov 15, 2023

legel commented Nov 15, 2023

legel commented Nov 15, 2023

tungts1101 commented Nov 16, 2023

legel commented Nov 19, 2023

hoominchu commented Feb 22, 2024

yhsmiley commented Apr 9, 2024

legel commented Apr 19, 2024

BOX-LEO commented Apr 29, 2024 • edited Loading

legel commented Apr 29, 2024

OrangeNo42 commented Aug 2, 2024

joker-bian commented Oct 26, 2024 • edited Loading

larosi commented Oct 27, 2024

BOX-LEO commented Apr 29, 2024 •

edited

Loading

joker-bian commented Oct 26, 2024 •

edited

Loading