-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After training, there are two pytorch_model files. Which one should I use? #31
Comments
we should use pytorch_model.bin. This one stores the generator parameter. The pytorch_model_1.bin stores the guidance model's parameter's, specificlaly the real and fake unit. In the error message, it seems torch load itself doesn't work which is weird. could you check if your torch version is the same as the one we used. and I am wondering if changing the map_location="cpu" might have any difference Thanks |
Thank you, setting map_location="cpu" works, but the inference seems to be very slow. torchrun --nnodes 1 --nproc_per_node=16 main/train_sd.py \
--generator_lr 1e-5 \
--guidance_lr 1e-5 \
--train_iters 30000 \
--output_path $CHECKPOINT_PATH \
--batch_size 44 \
--grid_size 2 \
--initialie_generator --log_iters 1000 \
--resolution 512 \
--latent_resolution 64 \
--seed 10 \
--real_guidance_scale 1.75 \
--fake_guidance_scale 1.0 \
--max_grad_norm 10.0 \
--model_id "/root/workspace/env_run/sd1.5" \
--train_prompt_path /root/workspace/env_run/dmd2/prompts/shuffled.txt \
--afs_data_path="/root/workspace/env/2kw_merge_result/" \
--afs_part_list="/root/paddlejob/workspace/env/2kw_part_count/part-00000" \
--log_path /root/env_run/dmd2/tensorboard_log_sd1.5 \
--wandb_iters 50 \
--use_fp16 \
--log_loss \
--dfake_gen_update_ratio 10 \
--gradient_checkpointing |
I don't get it. map_location="cpu" will only specify the device during loading. pipe = StableDiffusionXLPipeline.from_pretrained(base_model_id, unet=unet).to("cuda") should still use the cuda
What step are you referring to ? |
and btw, if you want good images, you need to use larger guidance scale |
For example, in the example given in your README, when loading the "dmd2_sdxl_1step_unet_fp16.bin" model, num_inference_steps is set to 1 when pipe inference, and when loading the "dmd2_sdxl_4step_unet_fp16.safetensors" model, num_inference_steps is set to 4 when pipe inference. So, when loading "pytorch_model.bin", what should num_inference_steps be set to? |
We found two
pytorch_model
files in the training checkpoint,pytorch_model_1.bin
andpytorch_model.bin
. Which one should I use?We observed that the sizes of these two files differ significantly from the official
dmd2_sdxl_1step_unet.bin
(10.3 GB), which might indicate an issue.We tried inference with each of these
pytorch_model
files but encountered different errors.Using
pytorch_model_1.bin
for inference:Error:
Using
pytorch_model.bin
for inference:Error:
The text was updated successfully, but these errors were encountered: