Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot reproduce the results? #12

Closed
zhiweihu1103 opened this issue May 25, 2024 · 12 comments
Closed

cannot reproduce the results? #12

zhiweihu1103 opened this issue May 25, 2024 · 12 comments

Comments

@zhiweihu1103
Copy link

Hi, I think this is a nice work, however, I use the same hyperparameters and code as you provided, I cannot reproduce the results, I dont modify anything, would you mind give some advices? Thx.

image

@sunshangquan
Copy link
Owner

Hi, @zhiweihu1103 , I notice the first two experiments are close to the reported values. The reported values are the averaging of 4 trials and they could be slightly fluctuating. Because we ran the experiments last year on the other machine and code repository based on mdistiller (DKD). This github repository is reset on MLKD for publicity. So the code and config file may be different. I will check and respond to the issue in two weeks. For now, you could also try with other choices of beta/kd_weight/base_temp. The results on a100/v100 would also be different from those on other machines like 1080.

@zhiweihu1103
Copy link
Author

zhiweihu1103 commented May 27, 2024

Thanks for your reply.

  1. It means that the logs you provided are based on the DKD code, right? Do you remember what modifications were made to DKD? Is it just adding the following part of the code?
def normalize(logit):
    mean = logit.mean(dim=-1, keepdims=True)
    stdv = logit.std(dim=-1, keepdims=True)
    return (logit - mean) / (1e-7 + stdv)

logits_student = normalize(logits_student_in) if logit_stand else logits_student_in
logits_teacher = normalize(logits_teacher_in) if logit_stand else logits_teacher_in

Also I find in train.py, there are some new codes:

 if args.logit_stand and cfg.DISTILLER.TYPE in ['KD','DKD','MLKD']:
        cfg.EXPERIMENT.LOGIT_STAND = True
        if cfg.DISTILLER.TYPE == 'KD':
            cfg.KD.LOSS.KD_WEIGHT = args.kd_weight
            cfg.KD.TEMPERATURE = args.base_temp
        elif cfg.DISTILLER.TYPE == 'DKD':
            cfg.DKD.ALPHA = cfg.DKD.ALPHA * args.kd_weight
            cfg.DKD.BETA = cfg.DKD.BETA * args.kd_weight
            cfg.DKD.T = args.base_temp
        elif cfg.DISTILLER.TYPE == 'MLKD':
            cfg.KD.LOSS.KD_WEIGHT = args.kd_weight
            cfg.KD.TEMPERATURE = args.base_temp
  1. What kind of GPU did you experiment with before? What I am using now is 4090.

  2. On my current machine, based on the code and parameters provided by DKD, I am completely unable to reproduce the results of the DKD paper. I would like to ask if you have reproduced the results of DKD before?

@sunshangquan
Copy link
Owner

  1. Yes for this repository. But we will check the original repository we experiment on.
    2&3. We ran on V100. We do not know the hardware of DKD's original implementation, but we can reproduce most of the results of DKD and our results are incrementally improved on their results. So I recommend to first try reproducing their results and then ours.

@zhiweihu1103
Copy link
Author

Many thanks, I will give a try and further to reply you.

@zhiweihu1103
Copy link
Author

Hi, let me update something. Even running the code on a V100, I still have trouble getting results close to the paper, and my results are still about 0.3 different from yours for the red box teacher-student combinations, but this difference is relatively large for this task.

@sunshangquan
Copy link
Owner

Hi, @zhiweihu1103 Have you reproduced the results of DKD only? As we previously discussed, our method is based on DKD, but you were completely unable to reproduce the results of DKD, and you were thus recommended to first reproduce their results.

@sunshangquan
Copy link
Owner

sunshangquan commented Jun 4, 2024

Hi, @zhiweihu1103 I notice your reproduced DKD's results at megvii-research/mdistiller#66 .

Regarding your difficulty in reproducing the specific results, I may recommend to check weather the teacher weights are correct. Besides, as mentioned in megvii-research/mdistiller#7 , the experiments are unstable on several settings and thus multiple trials may help.

Despite the specific values, our method shows significant improvement compared to the results of DKD in your reproduction, which essentially represents our contribution.

@zhiweihu1103
Copy link
Author

Yes, I also cannot reproduce the DKD results, I want to based on your code to reproduce the results, but I have failured, I dont know what happend, I use the teacher weights from https://github.com/megvii-research/mdistiller/releases/tag/checkpoints.

@sunshangquan
Copy link
Owner

sunshangquan commented Jun 4, 2024

Hi @zhiweihu1103 , Have you changed the BETA, BASE_TEMP and KD_WEIGHT according to the values in the files names at https://github.com/sunshangquan/logit-standardization-KD/tree/master/logs/DKD ?

I check my logs and configs for the four experiments you cannot reproduce and show you below.

  1. For wrn_40_2/wrn_40_1, you may try:
python tools/train.py --cfg configs/cifar100/dkd/wrn_40_2_wrn_40_1.yaml --logit-stand --base-temp 2 --kd-weight 12 

Because I find my logs files:
dkd,wrn_40_2,wrn_40_1,2,12,2.0.txt: best_acc: 74.93
dkd,wrn_40_2,wrn_40_1,2,9,2.0.txt: best_acc: 74.45
dkd,wrn_40_2,wrn_40_1,2,9,8.0.txt: best_acc: 74.57

  1. For wrn_40_2/wrn_16_2, you may try:
python tools/train.py --cfg configs/cifar100/dkd/wrn_40_2_wrn_16_2.yaml --logit-stand --base-temp 2 --kd-weight 12 

Because I find my logs files:
dkd,wrn_40_2,wrn_16_2,2,12,2.0.txt: best_acc: 76.27
dkd,wrn_40_2,wrn_16_2,2,9,2.0.txt: best_acc: 76.16
dkd,wrn_40_2,wrn_16_2,2,9,8.0.txt: best_acc: 75.78

  1. For ResNet56/ResNet20, you may try:
python tools/train.py --cfg configs/cifar100/dkd/resnet56_resnet20.yaml --logit-stand --base-temp 2 --kd-weight 18 

Because I find my logs files:
dkd,resnet56,resnet20,3,18,2.0.txt: best_acc: 72.24
dkd,resnet56,resnet20,2,4,2.0.txt: best_acc: 71.87

  1. For ResNet110/ResNet32, you may try:
python tools/train.py --cfg configs/cifar100/dkd/resnet110_resnet32.yaml --logit-stand --base-temp 2 --kd-weight 15

Because I find my logs files:
logs/DKD/dkd,resnet110,resnet32,2,15,2.0.txt: best_acc: 74.27

Hope these information can help you.

I really cannot ensure whether the failure of reproduction comes from our repository or DKD. Since you cannot reproduce the DKD's results at all, the reproduced results of our method may probably be under-measured.

@zhiweihu1103
Copy link
Author

Thanks for your reply. I noticed that your results also fluctuate to a large extent. Is this normal?

@sunshangquan
Copy link
Owner

Yes for some cases. DKD is more sensitive to the initial states (maybe machine-related), parameters, etc., and more unstable than other baselines like KD and CTKD.

@zhiweihu1103
Copy link
Author

Let me give further check, thanks for you reply, I will give other comments after running many times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants