-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NOTE discussion w/ pablin #2
Comments
A jax implementation of |
Regarding " comparable loss " I don't think it can be made properly, because we need to ensure that the frameworks all differentiate wrt to the same loss. We can always compute the same loss for each model afterwards, but we cannot force them to optimize this loss using a Anyways the most important metric is arguably accuracy. |
One very interesting aspect of all the CIFAR10 scripts, is that they use the CIFAR10 version of the ResNet18 (and bigger), which basically doesn't downsample the images 4-fold in the first convolution. In the Imagenet version, the first convolution is done to downsample the image 2-fold, and there is then a 2-fold maxpooling op. As explained here, this means that the network only sees a 8x8 image from the very start. As a side-note, the momentumnet ResNet also features this difference. |
My point with the comparable loss is not for the training but for the
computation of the objective function.
Le mer. 6 avr. 2022 à 02:17, Pierre Ablin ***@***.***> a
écrit :
… Regarding " *comparable loss* " I don't think it can be made properly,
because we need to ensure that the frameworks all *differentiate* wrt to
the same loss. We can always compute the same loss for each model
afterwards, but we cannot force them to optimize this loss using a
predict_proba function.
Anyways the most important metric is arguably accuracy.
—
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZKZ6IPRKI4ILEYOV2TLL3VDTJYNANCNFSM5RUXEZSA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
These are notes from our discussion w/ @pierreablin on the design of the benchmark for NN. Feel free to comment/add/edit stuff.
Critical steps for CIFAR10 training
There are a few critical steps to watch for good performance when training a neural net on CIFAR:
A source of inspiration for these choice can be: kuangliu/pytorch-cifar.
Source of variation between framework
there are a few sources of variation between the framework that will be hard to control:
BatchNorm
, different drop out, ...)It is probably fine to not control them completly as this can highlight the differences in some design choices. But it is important to list them well in the paper.
Implementations to do
predict_proba(X: np.array) -> np.array
function, that return the class probabilities for each samples. That way, we are sure we input the same think and we compute the loss the same.submitit
. I think this should be my priority as this will impact all the chain. I will start from ENH add Parallelized benchmark benchopt#265 and improve.pytorch
totf
and there is no unfair advantage to one framework vs the other. This could for instance be controlled with a parameterframework
and datasets of improper frameworks can be skipped withbenchopt.BaseSolver.skip
. To make the plot possible, see next point.Objective/Dataset
, one could want to plot all the curves in the same plot. For instance, this is the case withframework
(see above) or if we use data augmentation or not. Another possiblity woulb be if we have several architectures. Technically, this is something to do at plot time, by simply changing the filtering of what to put on a plot. The big question is on the API side, how to tell which parameters are to be ignored when merging plot together.Next step ?
If we have more time for this benchmark, a few ideas that we could try:
The text was updated successfully, but these errors were encountered: