cannot reproduce the results? #12

zhiweihu1103 · 2024-05-25T16:17:05Z

Hi, I think this is a nice work, however, I use the same hyperparameters and code as you provided, I cannot reproduce the results, I dont modify anything, would you mind give some advices? Thx.

sunshangquan · 2024-05-27T04:41:08Z

Hi, @zhiweihu1103 , I notice the first two experiments are close to the reported values. The reported values are the averaging of 4 trials and they could be slightly fluctuating. Because we ran the experiments last year on the other machine and code repository based on mdistiller (DKD). This github repository is reset on MLKD for publicity. So the code and config file may be different. I will check and respond to the issue in two weeks. For now, you could also try with other choices of beta/kd_weight/base_temp. The results on a100/v100 would also be different from those on other machines like 1080.

zhiweihu1103 · 2024-05-27T08:00:16Z

Thanks for your reply.

It means that the logs you provided are based on the DKD code, right? Do you remember what modifications were made to DKD? Is it just adding the following part of the code?

def normalize(logit):
    mean = logit.mean(dim=-1, keepdims=True)
    stdv = logit.std(dim=-1, keepdims=True)
    return (logit - mean) / (1e-7 + stdv)

logits_student = normalize(logits_student_in) if logit_stand else logits_student_in
logits_teacher = normalize(logits_teacher_in) if logit_stand else logits_teacher_in

Also I find in train.py, there are some new codes:

 if args.logit_stand and cfg.DISTILLER.TYPE in ['KD','DKD','MLKD']:
        cfg.EXPERIMENT.LOGIT_STAND = True
        if cfg.DISTILLER.TYPE == 'KD':
            cfg.KD.LOSS.KD_WEIGHT = args.kd_weight
            cfg.KD.TEMPERATURE = args.base_temp
        elif cfg.DISTILLER.TYPE == 'DKD':
            cfg.DKD.ALPHA = cfg.DKD.ALPHA * args.kd_weight
            cfg.DKD.BETA = cfg.DKD.BETA * args.kd_weight
            cfg.DKD.T = args.base_temp
        elif cfg.DISTILLER.TYPE == 'MLKD':
            cfg.KD.LOSS.KD_WEIGHT = args.kd_weight
            cfg.KD.TEMPERATURE = args.base_temp

What kind of GPU did you experiment with before? What I am using now is 4090.
On my current machine, based on the code and parameters provided by DKD, I am completely unable to reproduce the results of the DKD paper. I would like to ask if you have reproduced the results of DKD before?

sunshangquan · 2024-05-27T10:55:32Z

Yes for this repository. But we will check the original repository we experiment on.
2&3. We ran on V100. We do not know the hardware of DKD's original implementation, but we can reproduce most of the results of DKD and our results are incrementally improved on their results. So I recommend to first try reproducing their results and then ours.

zhiweihu1103 · 2024-05-27T10:59:19Z

Many thanks, I will give a try and further to reply you.

zhiweihu1103 · 2024-06-04T11:23:27Z

Hi, let me update something. Even running the code on a V100, I still have trouble getting results close to the paper, and my results are still about 0.3 different from yours for the red box teacher-student combinations, but this difference is relatively large for this task.

sunshangquan · 2024-06-04T13:35:00Z

Hi, @zhiweihu1103 Have you reproduced the results of DKD only? As we previously discussed, our method is based on DKD, but you were completely unable to reproduce the results of DKD, and you were thus recommended to first reproduce their results.

sunshangquan · 2024-06-04T13:57:19Z

Hi, @zhiweihu1103 I notice your reproduced DKD's results at megvii-research/mdistiller#66 .

Regarding your difficulty in reproducing the specific results, I may recommend to check weather the teacher weights are correct. Besides, as mentioned in megvii-research/mdistiller#7 , the experiments are unstable on several settings and thus multiple trials may help.

Despite the specific values, our method shows significant improvement compared to the results of DKD in your reproduction, which essentially represents our contribution.

zhiweihu1103 · 2024-06-04T14:03:09Z

Yes, I also cannot reproduce the DKD results, I want to based on your code to reproduce the results, but I have failured, I dont know what happend, I use the teacher weights from https://github.com/megvii-research/mdistiller/releases/tag/checkpoints.

sunshangquan · 2024-06-04T14:36:04Z

Hi @zhiweihu1103 , Have you changed the BETA, BASE_TEMP and KD_WEIGHT according to the values in the files names at https://github.com/sunshangquan/logit-standardization-KD/tree/master/logs/DKD ?

I check my logs and configs for the four experiments you cannot reproduce and show you below.

For wrn_40_2/wrn_40_1, you may try:

python tools/train.py --cfg configs/cifar100/dkd/wrn_40_2_wrn_40_1.yaml --logit-stand --base-temp 2 --kd-weight 12

Because I find my logs files:
dkd,wrn_40_2,wrn_40_1,2,12,2.0.txt: best_acc: 74.93
dkd,wrn_40_2,wrn_40_1,2,9,2.0.txt: best_acc: 74.45
dkd,wrn_40_2,wrn_40_1,2,9,8.0.txt: best_acc: 74.57

For wrn_40_2/wrn_16_2, you may try:

python tools/train.py --cfg configs/cifar100/dkd/wrn_40_2_wrn_16_2.yaml --logit-stand --base-temp 2 --kd-weight 12

Because I find my logs files:
dkd,wrn_40_2,wrn_16_2,2,12,2.0.txt: best_acc: 76.27
dkd,wrn_40_2,wrn_16_2,2,9,2.0.txt: best_acc: 76.16
dkd,wrn_40_2,wrn_16_2,2,9,8.0.txt: best_acc: 75.78

For ResNet56/ResNet20, you may try:

python tools/train.py --cfg configs/cifar100/dkd/resnet56_resnet20.yaml --logit-stand --base-temp 2 --kd-weight 18

Because I find my logs files:
dkd,resnet56,resnet20,3,18,2.0.txt: best_acc: 72.24
dkd,resnet56,resnet20,2,4,2.0.txt: best_acc: 71.87

For ResNet110/ResNet32, you may try:

python tools/train.py --cfg configs/cifar100/dkd/resnet110_resnet32.yaml --logit-stand --base-temp 2 --kd-weight 15

Because I find my logs files:
logs/DKD/dkd,resnet110,resnet32,2,15,2.0.txt: best_acc: 74.27

Hope these information can help you.

I really cannot ensure whether the failure of reproduction comes from our repository or DKD. Since you cannot reproduce the DKD's results at all, the reproduced results of our method may probably be under-measured.

zhiweihu1103 · 2024-06-04T15:04:28Z

Thanks for your reply. I noticed that your results also fluctuate to a large extent. Is this normal?

sunshangquan · 2024-06-04T15:17:53Z

Yes for some cases. DKD is more sensitive to the initial states (maybe machine-related), parameters, etc., and more unstable than other baselines like KD and CTKD.

zhiweihu1103 · 2024-06-04T15:20:44Z

Let me give further check, thanks for you reply, I will give other comments after running many times.

sunshangquan closed this as completed Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot reproduce the results? #12

cannot reproduce the results? #12

zhiweihu1103 commented May 25, 2024

sunshangquan commented May 27, 2024

zhiweihu1103 commented May 27, 2024 •

edited

Loading

sunshangquan commented May 27, 2024

zhiweihu1103 commented May 27, 2024

zhiweihu1103 commented Jun 4, 2024

sunshangquan commented Jun 4, 2024

sunshangquan commented Jun 4, 2024 •

edited

Loading

zhiweihu1103 commented Jun 4, 2024

sunshangquan commented Jun 4, 2024 •

edited

Loading

zhiweihu1103 commented Jun 4, 2024

sunshangquan commented Jun 4, 2024

zhiweihu1103 commented Jun 4, 2024

cannot reproduce the results? #12

cannot reproduce the results? #12

Comments

zhiweihu1103 commented May 25, 2024

sunshangquan commented May 27, 2024

zhiweihu1103 commented May 27, 2024 • edited Loading

sunshangquan commented May 27, 2024

zhiweihu1103 commented May 27, 2024

zhiweihu1103 commented Jun 4, 2024

sunshangquan commented Jun 4, 2024

sunshangquan commented Jun 4, 2024 • edited Loading

zhiweihu1103 commented Jun 4, 2024

sunshangquan commented Jun 4, 2024 • edited Loading

zhiweihu1103 commented Jun 4, 2024

sunshangquan commented Jun 4, 2024

zhiweihu1103 commented Jun 4, 2024

zhiweihu1103 commented May 27, 2024 •

edited

Loading

sunshangquan commented Jun 4, 2024 •

edited

Loading

sunshangquan commented Jun 4, 2024 •

edited

Loading