You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the excellent work. I have two questions:
My understanding of distillation is that a student’s one inference step should be comparable to the teacher’s n inference steps. However, in both compute_distribution_matching_loss and compute_loss_fake, the true_unet and fake_unet only perform one-step denoising (for instance, if the generator is one step (399), the true_unet and fake_unet are also one step, assuming a random step of 540). Since the original true_unet also performs poorly with one-step denoising, why does this loss work?
I am trying to use DMD2 to distill a one-step SD cascade. Can I only use ### compute_distribution_matching_loss and compute_loss_fake?
The text was updated successfully, but these errors were encountered:
Thank you for the excellent work. I have two questions:
My understanding of distillation is that a student’s one inference step should be comparable to the teacher’s n inference steps. However, in both compute_distribution_matching_loss and compute_loss_fake, the true_unet and fake_unet only perform one-step denoising (for instance, if the generator is one step (399), the true_unet and fake_unet are also one step, assuming a random step of 540). Since the original true_unet also performs poorly with one-step denoising, why does this loss work?
I am trying to use DMD2 to distill a one-step SD cascade. Can I only use ### compute_distribution_matching_loss and compute_loss_fake?
The text was updated successfully, but these errors were encountered: