You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
StableDiffusion3 pipeline throws a RuntimeError when using prompt_embeds in lieu of prompt when using num_images_per_prompt > 1.
I am attempting to generate images using the StableDiffusion3 pipeline with some precomputed prompt embeddings. The prompt embeddings using the .encode_prompt(...) method of the pipeline and are passed to the call of the pipeline. Passing these encoded prompts to the pipeline leads to a Runtime error when:
num_images_per_prompt >1 for both the .encode_prompt(...) and the __call__(...).
num_images_per_prompt=1 for .encode_prompt(...) and num_images_per_prompt >1 for the __call__(...).
The StableDiffusionXL pipeline does not have these errors.
Reproduction
StableDiffusion3 Failing Cases
encode_prompt num_images_per_prompt>1 and call num_images_per_prompt>1
The code for this failing case is below:
importtorchfromdiffusersimportDiffusionPipelinemodel_name="stabilityai/stable-diffusion-3.5-medium"pipe=DiffusionPipeline.from_pretrained(
model_name, torch_dtype=torch.float16
).to("cuda")
# encode the prompts
(
prompt_embeds,
negative_prompt_embeds,
pooled_prompt_embeds,
negative_pooled_prompt_embeds,
) =pipe.encode_prompt(
prompt="A painting of a cat",
prompt_2=None,
prompt_3=None,
device="cuda",
do_classifier_free_guidance=True,
num_images_per_prompt=2, # NOTE
)
# sample (generate) from the diffusion model.withtorch.inference_mode():
out=pipe(
height=64, # this is set small for speeding up testingwidth=64, # this is set small for speeding up testingnum_images_per_prompt=2, # NOTEnum_inference_steps=1,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
pooled_prompt_embeds=pooled_prompt_embeds,
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
generator=torch.Generator(0)
)
encode_prompt num_images_per_prompt=1 and call num_images_per_prompt>1
importtorchfromdiffusersimportDiffusionPipelinemodel_name="stabilityai/stable-diffusion-3.5-medium"pipe=DiffusionPipeline.from_pretrained(
model_name, torch_dtype=torch.float16
).to("cuda")
# encode the prompts
(
prompt_embeds,
negative_prompt_embeds,
pooled_prompt_embeds,
negative_pooled_prompt_embeds,
) =pipe.encode_prompt(
prompt="A painting of a cat",
prompt_2=None,
prompt_3=None,
device="cuda",
do_classifier_free_guidance=True,
num_images_per_prompt=1, # NOTE
)
# sample (generate) from the diffusion model.withtorch.inference_mode():
out=pipe(
height=64, # this is set small for speeding up testingwidth=64, # this is set small for speeding up testingnum_images_per_prompt=2, # NOTEnum_inference_steps=1,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
pooled_prompt_embeds=pooled_prompt_embeds,
negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
generator=torch.Generator(0)
)
Expected Behaviour (StableDiffusionXL pipeline)
encode_prompt num_images_per_prompt=1 and call num_images_per_prompt>1
Traceback (most recent call last):
File "/home/jajal/research/diffusion-trajectory/sd3.py", line 26, in<module>
out = pipe(
^^^^^
File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/jajal/research/diffusers/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py", line 1060, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jajal/research/diffusers/src/diffusers/models/transformers/transformer_sd3.py", line 389, in forward
temb = self.time_text_embed(timestep, pooled_projections)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jajal/mambaforge/envs/diff-traf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jajal/research/diffusers/src/diffusers/models/embeddings.py", line 1606, in forward
conditioning = timesteps_emb + pooled_projections
~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (8) must match the size of tensor b (4) at non-singleton dimension 0
Describe the bug
StableDiffusion3 pipeline throws a RuntimeError when using
prompt_embeds
in lieu ofprompt
when usingnum_images_per_prompt > 1
.I am attempting to generate images using the StableDiffusion3 pipeline with some precomputed prompt embeddings. The prompt embeddings using the
.encode_prompt(...)
method of the pipeline and are passed to the call of the pipeline. Passing these encoded prompts to the pipeline leads to a Runtime error when:num_images_per_prompt >1
for both the.encode_prompt(...)
and the__call__(...)
.num_images_per_prompt=1
for.encode_prompt(...)
andnum_images_per_prompt >1
for the__call__(...)
.The StableDiffusionXL pipeline does not have these errors.
Reproduction
StableDiffusion3 Failing Cases
encode_prompt
num_images_per_prompt>1
and callnum_images_per_prompt>1
The code for this failing case is below:
encode_prompt
num_images_per_prompt=1
and callnum_images_per_prompt>1
Expected Behaviour (StableDiffusionXL pipeline)
encode_prompt
num_images_per_prompt=1
and callnum_images_per_prompt>1
encode_prompt
num_images_per_prompt>1
and callnum_images_per_prompt>1
Logs
System Info
Who can help?
@yiyixuxu @sayakpaul
The text was updated successfully, but these errors were encountered: