You can use Hunyuan-DiT in 🤗 Diffusers library. Before using the pipelines, please install the latest version of 🤗 Diffusers with
pip install git+https://github.com/huggingface/diffusers.git
You can generate images with both Chinese and English prompts using the following Python script:
import torch
from diffusers import HunyuanDiTPipeline
pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers", torch_dtype=torch.float16)
pipe.to("cuda")
# You may also use English prompt as HunyuanDiT supports both English and Chinese
# prompt = "An astronaut riding a horse"
prompt = "一个宇航员在骑马"
image = pipe(prompt).images[0]
You can use our distilled model to generate images even faster:
import torch
from diffusers import HunyuanDiTPipeline
pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled", torch_dtype=torch.float16)
pipe.to("cuda")
# You may also use English prompt as HunyuanDiT supports both English and Chinese
# prompt = "An astronaut riding a horse"
prompt = "一个宇航员在骑马"
image = pipe(prompt, num_inference_steps=25).images[0]
More details can be found in HunyuanDiT-Diffusers-Distilled
LoRA can be integrated with Hunyuan-DiT inside the 🤗 Diffusers framework. The following example loads and uses the pre-trained LoRA. To try it, please start by downloading our pre-trained LoRA checkpoints,
huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
Then run the following code snippet to use the jade LoRA:
import torch
from diffusers import HunyuanDiTPipeline
### convert checkpoint to diffusers format
num_layers = 40
def load_hunyuan_dit_lora(transformer_state_dict, lora_state_dict, lora_scale):
for i in range(num_layers):
Wqkv = torch.matmul(lora_state_dict[f"blocks.{i}.attn1.Wqkv.lora_B.weight"], lora_state_dict[f"blocks.{i}.attn1.Wqkv.lora_A.weight"])
q, k, v = torch.chunk(Wqkv, 3, dim=0)
transformer_state_dict[f"blocks.{i}.attn1.to_q.weight"] += lora_scale * q
transformer_state_dict[f"blocks.{i}.attn1.to_k.weight"] += lora_scale * k
transformer_state_dict[f"blocks.{i}.attn1.to_v.weight"] += lora_scale * v
out_proj = torch.matmul(lora_state_dict[f"blocks.{i}.attn1.out_proj.lora_B.weight"], lora_state_dict[f"blocks.{i}.attn1.out_proj.lora_A.weight"])
transformer_state_dict[f"blocks.{i}.attn1.to_out.0.weight"] += lora_scale * out_proj
q_proj = torch.matmul(lora_state_dict[f"blocks.{i}.attn2.q_proj.lora_B.weight"], lora_state_dict[f"blocks.{i}.attn2.q_proj.lora_A.weight"])
transformer_state_dict[f"blocks.{i}.attn2.to_q.weight"] += lora_scale * q_proj
kv_proj = torch.matmul(lora_state_dict[f"blocks.{i}.attn2.kv_proj.lora_B.weight"], lora_state_dict[f"blocks.{i}.attn2.kv_proj.lora_A.weight"])
k, v = torch.chunk(kv_proj, 2, dim=0)
transformer_state_dict[f"blocks.{i}.attn2.to_k.weight"] += lora_scale * k
transformer_state_dict[f"blocks.{i}.attn2.to_v.weight"] += lora_scale * v
out_proj = torch.matmul(lora_state_dict[f"blocks.{i}.attn2.out_proj.lora_B.weight"], lora_state_dict[f"blocks.{i}.attn2.out_proj.lora_A.weight"])
transformer_state_dict[f"blocks.{i}.attn2.to_out.0.weight"] += lora_scale * out_proj
q_proj = torch.matmul(lora_state_dict["pooler.q_proj.lora_B.weight"], lora_state_dict["pooler.q_proj.lora_A.weight"])
transformer_state_dict["time_extra_emb.pooler.q_proj.weight"] += lora_scale * q_proj
return transformer_state_dict
### use the diffusers pipeline with lora
pipe = HunyuanDiTPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers", torch_dtype=torch.float16)
pipe.to("cuda")
from safetensors import safe_open
lora_state_dict = {}
with safe_open("./ckpts/t2i/lora/jade/adapter_model.safetensors", framework="pt", device=0) as f:
for k in f.keys():
lora_state_dict[k[17:]] = f.get_tensor(k) # remove 'basemodel.model'
transformer_state_dict = pipe.transformer.state_dict()
transformer_state_dict = load_hunyuan_dit_lora(transformer_state_dict, lora_state_dict, lora_scale=1.0)
pipe.transformer.load_state_dict(transformer_state_dict)
prompt = "玉石绘画风格,一只猫在追蝴蝶"
image = pipe(
prompt,
num_inference_steps=100,
guidance_scale=6.0,
).images[0]
image.save('img.png')
You can control the strength of LoRA by changing the lora_scale
parameter.
Hunyuan-DiT + ControlNet is supported in 🤗 Diffusers. The following example shows how to use Hunyuan-DiT + Canny ControlNet.
from diffusers import HunyuanDiT2DControlNetModel, HunyuanDiTControlNetPipeline
import torch
controlnet = HunyuanDiT2DControlNetModel.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-ControlNet-Diffusers-Canny", torch_dtype=torch.float16)
pipe = HunyuanDiTControlNetPipeline.from_pretrained("Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers", controlnet=controlnet, torch_dtype=torch.float16)
pipe.to("cuda")
from diffusers.utils import load_image
cond_image = load_image('https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.1-ControlNet-Diffusers-Canny/resolve/main/canny.jpg?download=true')
## You may also use English prompt as HunyuanDiT supports both English and Chinese
prompt="在夜晚的酒店门前,一座古老的中国风格的狮子雕像矗立着,它的眼睛闪烁着光芒,仿佛在守护着这座建筑。背景是夜晚的酒店前,构图方式是特写,平视,居中构图。这张照片呈现了真实摄影风格,蕴含了中国雕塑文化,同时展现了神秘氛围"
#prompt="At night, an ancient Chinese-style lion statue stands in front of the hotel, its eyes gleaming as if guarding the building. The background is the hotel entrance at night, with a close-up, eye-level, and centered composition. This photo presents a realistic photographic style, embodies Chinese sculpture culture, and reveals a mysterious atmosphere."
image = pipe(
prompt,
height=1024,
width=1024,
control_image=cond_image,
num_inference_steps=50,
).images[0]
There are other pre-trained ControlNets available. Please have a look at the official huggingface website of Tencent Hunyuan Team