-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation / training scores mismatch #16
Comments
I also noticed that. In which line of the code did you do the changes? |
Removed |
Ah, okay, I think I meant something else. Your change would only have an effect on what I would have called the test scores, whereas I was concerned with the validation scores, which are determined in parallel with the training scores: https://towardsdatascience.com/train-validation-and-test-sets-72cb40cba9e7 Because, strangely enough, even these validation values are always higher than the training values in my case, although they should logically not be, see below. But the test results after leave-one-out cross-validation are actually also significantly higher. Can you simply do without rounding, most scores are only defined for binary decisions, aren't they? |
I forgot that I had changed that in my copy, I am using a trainStep() with a maximum number of epochs, early stopping and a predefined validation data set. Afterwards I use the evaluateModel() only on the separate test set. You are right, I am also interested about the effect of rounding or not rounding. |
Hi, I think the issue can also be solved by removing the batchnormalization in the output layer. Your paper states: by removing the batch-normalization at the output layer to a more standard output layer: suggestion: (also applicable to the 3D net) This resolved similar issue's for me. |
btw: I tested rounding during training and the results now look much more like shown in Figure 6 of the paper. I would be interested in a learning curve including training and validation score like in my example. |
Thank you @arilmad for your interest in our project, and thanks to @saskra and @Jderuijter for keeping the conversation running . Apologies for my late response as I was occupied with some other stuff for the last few days. First thing first, since Dice Coef or Jaccard Index are defined for binary values, we should round the values to compute them. In my notebook, honestly speaking I didn't used the metrics computed during the training procedure, so the fact that the values were not rounded was ignored by me. As it has been pointed in this thread, I have used the evaluateModel() function for my purpose instead. If you wish to compute the dice or jaccard values during training, it would be proper to round the values. Also, another thing may be noted, regarding why I didn't include the rounding in computing those metrics in the first place. Actually, I used those functions to compute dice or jaccard based loss functions, i.e. jaccard loss = - jaccard index. Now, when we compute them as metrics we must round them to obtain the actual value, by definition. But, when we are treating them loss functions, we should not round them, rather keep them as floating number, as it would help to improve the model. For example, suppose in one epoch a certain value was 0.67 and in the next epoch it becomes 0.78. If we don't round them, the improvement will be reflected in the loss value, but if we round them the improvement get's lost as round(0.67) = round(0.78) = 1. Since, I actually used those functions to experiment with dice or jaccard based loss function, I didn't do the rounding there. |
To add one more aspect, I once observed how the Jaccard score on the validation data set behaves during training, recording both the relative values as in the original source code and the rounded values during the same run. Interestingly, on this dataset, it looks like the relative values continue to increase for a while after a few epochs, while the rounded values, on the other hand, decrease again. I repeated that >100 times as part of a leave-one-out cross-validation and could observe the same pattern every time. (btw: The y label should be "Jaccard" and not "Loss".) |
Hi,
I have run your network based on the notbook in a project of mine. However, I pondered quite a bit over my validation Jaccard scores outperforming the training score by a large margin. I suspect the answer lies in the rounding of yp that you perform in evaluateModel. From what I can tell, this rounding is not done in the function that is used during training. After removing this rounding the scores matched as expected.
Please let me know if I'm missing the point somewhere, or if you agree with the observation.
Thanks for a superb piece of work!
Arild
The text was updated successfully, but these errors were encountered: