Why stair-like loss curve? #262

ChenDelong1999 · 2022-05-30T04:27:14Z

ChenDelong1999
May 30, 2022

As in here as well as my own implementation, stair-like loss curves are observed. Any possible reason for this?

rwightman · 2022-06-01T16:41:21Z

rwightman
Jun 1, 2022
Maintainer

I can't speak to the training run for the graph as I didn't do them, @mitchellnw would have a better idea... but looks like it could be a shuffling issue (as in not properly shuffling)

0 replies

mitchellnw · 2022-06-01T18:35:35Z

mitchellnw
Jun 1, 2022
Maintainer

My guess is also a shuffling issue with webdataset when these were run

0 replies

rom1504 · 2022-06-01T18:43:17Z

rom1504
Jun 1, 2022
Maintainer

If the data is not preshuffled you need both shards shuffling and local shuffling

0 replies

ChenDelong1999 · 2022-06-02T05:16:26Z

ChenDelong1999
Jun 2, 2022
Author

Perhaps it is not a shuffling issue with webdataset, since I train the model on CC3M (csv dataset), and observed the following curves.

which looks very similar to this curve in open_clip/docs/clip_conceptual_captions.md:

Loss increases within each epoch, then decrease after each epoch...

0 replies

rom1504 · 2022-06-02T07:43:14Z

rom1504
Jun 2, 2022
Maintainer

@ChenDelong1999 did you preshuffle the dataset (sort randomly the dataset) ?

0 replies

rwightman · 2022-06-02T15:11:50Z

rwightman
Jun 2, 2022
Maintainer

CsvDataset should be shuffled every epoch, pre-shuffle isn' really relevant. Might be worth checking that

open_clip/src/training/train.py

Line 62 in d9ee4aa

sampler.set_epoch(epoch)

set_epoch is def being called in distributed case...

0 replies

mitchellnw · 2022-06-09T18:57:11Z

mitchellnw
Jun 9, 2022
Maintainer

Looking at this again I wonder if it is caused by the scale param, which also exhibits stair-like behaviour.

I would expect that stair-like scale => stair-like loss.

But I have no guesses for why scale has stair-like behaviour.

To test this hypothesis I would use a 10x smaller learning rate on the scale parameter and see if this resolves the issue.

0 replies

rwightman · 2022-06-09T19:03:32Z

rwightman
Jun 9, 2022
Maintainer

@mitchellnw I've noticed that the scale param has interesting relationship with the LR/loss, I wonder if it's almost behaving in a slightly oscillatory control systems fashion. The scale is strongly impacted by the LR as well, if the LR is high enough the scale will not converge to 100 until it lowers

0 replies

mitchellnw · 2022-06-09T19:09:09Z

mitchellnw
Jun 9, 2022
Maintainer

Interesting. I wonder how accuracy/loss would be impacted if this learnable param was replaced by a scheduled param---something like 100 - k*cosine_decay(iteration).

0 replies

viyjy · 2022-08-28T22:09:22Z

viyjy
Aug 28, 2022

here

Hi, thanks for your answer, but the scale param does not exhibit stair-like behaviour during the training process, isn't it? On the other hand, scale is learnable, it shouldn't be stair-like, right?

0 replies

JiachengCheng96 · 2023-02-22T20:34:36Z

JiachengCheng96
Feb 22, 2023

Hi @mitchellnw @rom1504, I observed a similar stair-like loss curve when training a ViT-B-32 model on the LAION400M. Do you think this is normal? Or could you share one loss curve of any of your models on LAION400M? Any information would be greatly appreciated.

2 replies

mitchellnw Feb 22, 2023
Maintainer

are you using --dataset-resampled? I think our working hypothesis is that this is a shuffling issue..

JiachengCheng96 Feb 22, 2023

@mitchellnw Thanks for the reply! No, I didn't use --dataset-resampled. Do you think such stair-like loss should be expected when training without --dataset-resampled? I am trying to reproduce the VIT-B-32 result on LAION400M using the exact same setting provided in #191 (which didn't use --dataset-resampled), but gets accuracy a bit lower than your 62.9%. Hence, I am wondering if there is something wrong with my stair-like loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why stair-like loss curve? #262

{{title}}

Replies: 11 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Why stair-like loss curve? #262

ChenDelong1999 May 30, 2022

Replies: 11 comments · 2 replies

rwightman Jun 1, 2022 Maintainer

mitchellnw Jun 1, 2022 Maintainer

rom1504 Jun 1, 2022 Maintainer

ChenDelong1999 Jun 2, 2022 Author

rom1504 Jun 2, 2022 Maintainer

rwightman Jun 2, 2022 Maintainer

mitchellnw Jun 9, 2022 Maintainer

rwightman Jun 9, 2022 Maintainer

mitchellnw Jun 9, 2022 Maintainer

viyjy Aug 28, 2022

JiachengCheng96 Feb 22, 2023

mitchellnw Feb 22, 2023 Maintainer

JiachengCheng96 Feb 22, 2023

ChenDelong1999
May 30, 2022

Replies: 11 comments 2 replies

rwightman
Jun 1, 2022
Maintainer

mitchellnw
Jun 1, 2022
Maintainer

rom1504
Jun 1, 2022
Maintainer

ChenDelong1999
Jun 2, 2022
Author

rom1504
Jun 2, 2022
Maintainer

rwightman
Jun 2, 2022
Maintainer

mitchellnw
Jun 9, 2022
Maintainer

rwightman
Jun 9, 2022
Maintainer

mitchellnw
Jun 9, 2022
Maintainer

viyjy
Aug 28, 2022

JiachengCheng96
Feb 22, 2023

mitchellnw Feb 22, 2023
Maintainer