-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot reproduce the results? #12
Comments
Hi, @zhiweihu1103 , I notice the first two experiments are close to the reported values. The reported values are the averaging of 4 trials and they could be slightly fluctuating. Because we ran the experiments last year on the other machine and code repository based on mdistiller (DKD). This github repository is reset on MLKD for publicity. So the code and config file may be different. I will check and respond to the issue in two weeks. For now, you could also try with other choices of beta/kd_weight/base_temp. The results on a100/v100 would also be different from those on other machines like 1080. |
Thanks for your reply.
def normalize(logit):
mean = logit.mean(dim=-1, keepdims=True)
stdv = logit.std(dim=-1, keepdims=True)
return (logit - mean) / (1e-7 + stdv)
logits_student = normalize(logits_student_in) if logit_stand else logits_student_in
logits_teacher = normalize(logits_teacher_in) if logit_stand else logits_teacher_in Also I find in train.py, there are some new codes: if args.logit_stand and cfg.DISTILLER.TYPE in ['KD','DKD','MLKD']:
cfg.EXPERIMENT.LOGIT_STAND = True
if cfg.DISTILLER.TYPE == 'KD':
cfg.KD.LOSS.KD_WEIGHT = args.kd_weight
cfg.KD.TEMPERATURE = args.base_temp
elif cfg.DISTILLER.TYPE == 'DKD':
cfg.DKD.ALPHA = cfg.DKD.ALPHA * args.kd_weight
cfg.DKD.BETA = cfg.DKD.BETA * args.kd_weight
cfg.DKD.T = args.base_temp
elif cfg.DISTILLER.TYPE == 'MLKD':
cfg.KD.LOSS.KD_WEIGHT = args.kd_weight
cfg.KD.TEMPERATURE = args.base_temp
|
|
Many thanks, I will give a try and further to reply you. |
Hi, let me update something. Even running the code on a V100, I still have trouble getting results close to the paper, and my results are still about 0.3 different from yours for the red box teacher-student combinations, but this difference is relatively large for this task. |
Hi, @zhiweihu1103 Have you reproduced the results of DKD only? As we previously discussed, our method is based on DKD, but you were completely unable to reproduce the results of DKD, and you were thus recommended to first reproduce their results. |
Hi, @zhiweihu1103 I notice your reproduced DKD's results at megvii-research/mdistiller#66 . Regarding your difficulty in reproducing the specific results, I may recommend to check weather the teacher weights are correct. Besides, as mentioned in megvii-research/mdistiller#7 , the experiments are unstable on several settings and thus multiple trials may help. Despite the specific values, our method shows significant improvement compared to the results of DKD in your reproduction, which essentially represents our contribution. |
Yes, I also cannot reproduce the DKD results, I want to based on your code to reproduce the results, but I have failured, I dont know what happend, I use the teacher weights from https://github.com/megvii-research/mdistiller/releases/tag/checkpoints. |
Hi @zhiweihu1103 , Have you changed the BETA, BASE_TEMP and KD_WEIGHT according to the values in the files names at https://github.com/sunshangquan/logit-standardization-KD/tree/master/logs/DKD ? I check my logs and configs for the four experiments you cannot reproduce and show you below.
Because I find my logs files:
Because I find my logs files:
Because I find my logs files:
Because I find my logs files: Hope these information can help you. I really cannot ensure whether the failure of reproduction comes from our repository or DKD. Since you cannot reproduce the DKD's results at all, the reproduced results of our method may probably be under-measured. |
Thanks for your reply. I noticed that your results also fluctuate to a large extent. Is this normal? |
Yes for some cases. DKD is more sensitive to the initial states (maybe machine-related), parameters, etc., and more unstable than other baselines like KD and CTKD. |
Let me give further check, thanks for you reply, I will give other comments after running many times. |
Hi, I think this is a nice work, however, I use the same hyperparameters and code as you provided, I cannot reproduce the results, I dont modify anything, would you mind give some advices? Thx.
The text was updated successfully, but these errors were encountered: