Skip to content

Commit

Permalink
Merge pull request #3221 from vladmandic/dev
Browse files Browse the repository at this point in the history
merge dev to master
  • Loading branch information
vladmandic authored Jun 13, 2024
2 parents 84f9caa + 2b17186 commit c8b5ed4
Show file tree
Hide file tree
Showing 121 changed files with 328,919 additions and 885 deletions.
93 changes: 93 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,98 @@
# Change Log for SD.Next

## Update for 2024-06-13

### Highlights for 2024-06-13

First, yes, it's here and supported: [**StabilityAI Stable Diffusion 3 Medium**](https://stability.ai/news/stable-diffusion-3-medium)
for details on how to download and use, see [Wiki](https://github.com/vladmandic/automatic/wiki/SD3)

#### What else?

A lot of work on state-of-the-art multi-lingual models with both [Tenecent HunyuanDiT](https://github.com/Tencent/HunyuanDiT) and [MuLan](https://github.com/mulanai/MuLan)
Plus tons of minor features such as optimized initial install experience, **T-Gate** and **ResAdapter**, additional ModernUI themes (both light and dark) and fixes since the last release which was only 2 weeks ago!

### Full Changelog for 2024-06-13

#### New Models

- [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)
yup, supported!
quote: *"Stable Diffusion 3 Medium is a multimodal diffusion transformer (MMDiT) model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency"*
sdnext also supports switching optional T5 text encoder on-the-fly as well as loading model from either diffusers repo or safetensors single-file
for details, see [Wiki](https://github.com/vladmandic/automatic/wiki/SD3)
- [Tenecent HunyuanDiT](https://github.com/Tencent/HunyuanDiT) bilingual english/chinese diffusion transformer model
note: this is a very large model at ~17GB, but can be used with less VRAM using model offloading
simply select from networks -> models -> reference, model will be auto-downloaded on first use

#### New Functionality

- [MuLan](https://github.com/mulanai/MuLan) Multi-langunage prompts
write your prompts forin ~110 auto-detected languages!
compatible with *SD15* and *SDXL*
enable in scripts -> MuLan and set encoder to `InternVL-14B-224px` encoder
*note*: right now this is more of a proof-of-concept before smaller and/or quantized models are released
model will be auto-downloaded on first use: note its huge size of 27GB
even executing it in FP16 will require ~16GB of VRAM for text encoder alone
examples:
- English: photo of a beautiful woman wearing a white bikini on a beach with a city skyline in the background
- Croatian: fotografija lijepe žene u bijelom bikiniju na plaži s gradskim obzorom u pozadini
- Italian: Foto di una bella donna che indossa un bikini bianco su una spiaggia con lo skyline di una città sullo sfondo
- Spanish: Foto de una hermosa mujer con un bikini blanco en una playa con un horizonte de la ciudad en el fondo
- German: Foto einer schönen Frau in einem weißen Bikini an einem Strand mit einer Skyline der Stadt im Hintergrund
- Arabic: صورة لامرأة جميلة ترتدي بيكيني أبيض على شاطئ مع أفق المدينة في الخلفية
- Japanese: 街のスカイラインを背景にビーチで白いビキニを着た美しい女性の写真
- Chinese: 一个美丽的女人在海滩上穿着白色比基尼的照片, 背景是城市天际线
- Korean: 도시의 스카이라인을 배경으로 해변에서 흰색 비키니를 입은 아름 다운 여성의 사진
- [T-Gate](https://github.com/HaozheLiu-ST/T-GATE) Speed up generations by gating at which step cross-attention is no longer needed
enable via scripts -> t-gate
compatible with *SD15*
- **PCM LoRAs** allow for fast denoising using less steps with standard *SD15* and *SDXL* models
download from <https://huggingface.co/Kijai/converted_pcm_loras_fp16/tree/main>
- [ByteDance ResAdapter](https://github.com/bytedance/res-adapter) resolution-free model adapter
allows to use resolutions from 0.5 to 2.0 of original model resolution, compatible with *SD15* and *SDXL*
enable via scripts -> resadapter and select desired model
- **Kohya HiRes Fix** allows for higher resolution generation using standard *SD15* models
enable via scripts -> kohya-hires-fix
*note*: alternative to regular hidiffusion method, but with different approach to scaling
- additional built-in 4 great custom trained **ControlNet SDXL** models from Xinsir: OpenPose, Canny, Scribble, AnimePainter
thanks @lbeltrame
- add torch **full deterministic mode**
enable in settings -> compute -> use deterministic mode
typical differences are not large and its disabled by default as it does have some performance impact
- new sampler: **Euler FlowMatch**

#### Improvements

- additional modernui themes
- reintroduce prompt attention normalization, disabled by default, enable in settings -> execution
this can drastically help with unbalanced prompts
- further work on improving python 3.12 functionality and remove experimental flag
note: recommended version remains python 3.11 for all users except if you're using directml and then its python 3.10
- improved **installer** for initial installs
initial install will do single-pass install of all required packages with correct versions
subsequent runs will check package versions as necessary
- add env variable `SD_PIP_DEBUG` to write `pip.log` for all pip operations
also improved installer logging
- add python version check for `torch-directml`
- do not install `tensorflow` by default
- improve metadata/infotext parser
add `cli/image-exif.py` that can be used to view/extract metadata from images
- lower overhead on generate calls
- auto-synchronize modernui and core branches
- add option to pad prompt with zeros, thanks @Disty

#### Fixes

- cumulative fixes since the last release
- fix apply/unapply hidiffusion for sd15
- fix controlnet reference enabled check
- fix face-hires with control batch count
- install pynvml on-demand
- apply rollback-vae option to latest torch versions, thanks @Iaotle
- face hires skip if strength is 0
- restore all sampler configuration on sampler change

## Update for 2024-06-02

- fix textual inversion loading
Expand Down
36 changes: 20 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@
## Table of contents

- [SD.Next Features](#sdnext-features)
- [Backend support](#backend-support)
- [Model support](#model-support)
- [Platform support](#platform-support)
- [Backend support](#backend-support)
- [Examples](#examples)
- [Install](#install)
- [Notes](#notes)
Expand All @@ -31,7 +31,7 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
- Multiple UIs!
**Standard | Modern**
- Multiple diffusion models!
**Stable Diffusion 1.5/2.1 | SD-XL | LCM | Segmind | Kandinsky | Pixart-α | Pixart-Σ | Stable Cascade | Würstchen | aMUSEd | DeepFloyd IF | UniDiffusion | SD-Distilled | BLiP Diffusion | KOALA | SDXS | Hyper-SD | etc.**
**Stable Diffusion 1.5/2.1/XL/3.0 | LCM | Lightning | Segmind | Kandinsky | Pixart-α | Pixart-Σ | Stable Cascade | Würstchen | aMUSEd | DeepFloyd IF | UniDiffusion | SD-Distilled | BLiP Diffusion | KOALA | SDXS | Hyper-SD | HunyuanDiT | etc.**
- Built-in Control for Text, Image, Batch and video processing!
**ControlNet | ControlNet XS | Control LLLite | T2I Adapters | IP Adapters**
- Multiplatform!
Expand All @@ -54,30 +54,19 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG

*Main interface using **ModernUI***:
![Screenshot-Dark](html/screenshot-modernui.jpg)
![Screenshot-Dark](html/screenshot-modernui-sd3.jpg)

For screenshots and informations on other available themes, see [Themes Wiki](https://github.com/vladmandic/automatic/wiki/Themes)

<br>

## Backend support

**SD.Next** supports two main backends: *Diffusers* and *Original*:

- **Diffusers**: Based on new [Huggingface Diffusers](https://huggingface.co/docs/diffusers/index) implementation
Supports *all* models listed below
This backend is set as default for new installations
See [wiki article](https://github.com/vladmandic/automatic/wiki/Diffusers) for more information
- **Original**: Based on [LDM](https://github.com/Stability-AI/stablediffusion) reference implementation and significantly expanded on by [A1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
This backend and is fully compatible with most existing functionality and extensions written for *A1111 SDWebUI*
Supports **SD 1.x** and **SD 2.x** models
All other model types such as *SD-XL, LCM, Stable Cascade, PixArt, Playground, Segmind, Kandinsky, etc.* require backend **Diffusers**

## Model support

Additional models will be added as they become available and there is public interest in them

- [RunwayML Stable Diffusion](https://github.com/Stability-AI/stablediffusion/) 1.x and 2.x *(all variants)*
- [StabilityAI Stable Diffusion XL](https://github.com/Stability-AI/generative-models)
- [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)
- [StabilityAI Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) Base, XT 1.0, XT 1.1
- [LCM: Latent Consistency Models](https://github.com/openai/consistency_models)
- [Playground](https://huggingface.co/playgroundai/playground-v2-256px-base) *v1, v2 256, v2 512, v2 1024 and latest v2.5*
Expand All @@ -90,6 +79,7 @@ Additional models will be added as they become available and there is public int
- [PixArt-α XL 2](https://github.com/PixArt-alpha/PixArt-alpha) *Medium and Large*
- [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma)
- [Warp Wuerstchen](https://huggingface.co/blog/wuertschen)
- [Tenecent HunyuanDiT](https://github.com/Tencent/HunyuanDiT)
- [Tsinghua UniDiffusion](https://github.com/thu-ml/unidiffuser)
- [DeepFloyd IF](https://github.com/deep-floyd/IF) *Medium and Large*
- [ModelScope T2V](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b)
Expand All @@ -102,11 +92,12 @@ Additional models will be added as they become available and there is public int


Also supported are modifiers such as:
- **LCM** and **Turbo** (*adversarial diffusion distillation*) networks
- **LCM**, **Turbo** and **Lightning** (*adversarial diffusion distillation*) networks
- All **LoRA** types such as LoCon, LyCORIS, HADA, IA3, Lokr, OFT
- **IP-Adapters** for SD 1.5 and SD-XL
- **InstantID**, **FaceSwap**, **FaceID**, **PhotoMerge**
- **AnimateDiff** for SD 1.5
- **MuLAN** multi-language support

## Platform support

Expand All @@ -120,6 +111,19 @@ Also supported are modifiers such as:
- *Apple M1/M2* on *OSX* using built-in support in Torch with **MPS** optimizations
- *ONNX/Olive*

## Backend support

**SD.Next** supports two main backends: *Diffusers* and *Original*:

- **Diffusers**: Based on new [Huggingface Diffusers](https://huggingface.co/docs/diffusers/index) implementation
Supports *all* models listed below
This backend is set as default for new installations
See [wiki article](https://github.com/vladmandic/automatic/wiki/Diffusers) for more information
- **Original**: Based on [LDM](https://github.com/Stability-AI/stablediffusion) reference implementation and significantly expanded on by [A1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
This backend and is fully compatible with most existing functionality and extensions written for *A1111 SDWebUI*
Supports **SD 1.x** and **SD 2.x** models
All other model types such as *SD-XL, LCM, Stable Cascade, PixArt, Playground, Segmind, Kandinsky, etc.* require backend **Diffusers**

## Examples

*IP Adapters*:
Expand Down
19 changes: 0 additions & 19 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,37 +2,18 @@

Main ToDo list can be found at [GitHub projects](https://github.com/users/vladmandic/projects)

## Fix

- ultralytics package install

## Future Candidates

- stable diffusion 3.0: unreleased
- boxdiff <https://github.com/huggingface/diffusers/pull/7947>
- animatediff-sdxl <https://github.com/huggingface/diffusers/pull/6721>
- async lowvram: <https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14855>
- fp8: <https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14031>
- profiling: <https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/716>
- kohya-hires-fix: <https://github.com/huggingface/diffusers/pull/7633>
- hunyuan-dit: <https://github.com/huggingface/diffusers/pull/8290>
- init latents: variations, img2img
- diffusers public callbacks
- include reference styles
- lora: sc lora, dora, etc
- controlnet: additional models
- resadapter: <https://github.com/bytedance/res-adapter>
- t-gate: <https://huggingface.co/docs/diffusers/main/en/optimization/tgate>

## Experimental

- [MuLan](https://github.com/mulanai/MuLan) Multi-langunage prompts - wirte your prompts in ~110 auto-detected languages!
Compatible with SD15 and SDXL
Enable in scripts -> MuLan and set encoder to `InternVL-14B-224px` encoder
(that is currently only supported encoder, but others will be added)
Note: Model will be auto-downloaded on first use: note its huge size of 27GB
Even executing it in FP16 context will require ~16GB of VRAM for text encoder alone
*Note*: Uses fixed prompt parser, so no prompt attention will be used
- [SDXL Flash Mini](https://huggingface.co/sd-community/sdxl-flash-mini)
SDXL type that weighs less, consumes less video memory, and the quality has not dropped much
to use, simply select from *networks -> models -> reference -> SDXL Flash Mini*
Expand Down
67 changes: 7 additions & 60 deletions cli/image-exif.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,70 +4,17 @@
import io
import re
import sys
import json
import importlib.util
from PIL import Image, ExifTags, TiffImagePlugin, PngImagePlugin
from rich import print # pylint: disable=redefined-builtin


def unquote(text):
if len(text) == 0 or text[0] != '"' or text[-1] != '"':
return text
try:
return json.loads(text)
except Exception:
return text


def parse_generation_parameters(infotext):
if not isinstance(infotext, str):
return {}
re_param = re.compile(r'\s*([\w ]+):\s*("(?:\\"[^,]|\\"|\\|[^\"])+"|[^,]*)(?:,|$)') # multi-word: value
re_size = re.compile(r"^(\d+)x(\d+)$") # int x int
basic_params = ['steps', 'seed', 'width', 'height', 'sampler', 'size', 'cfg scale', 'hires'] # first param is one of those

sanitized = infotext.replace('prompt:', 'Prompt:').replace('negative prompt:', 'Negative prompt:').replace('Negative Prompt', 'Negative prompt') # cleanup everything in brackets so re_params can work
sanitized = re.sub(r'<[^>]*>', lambda match: ' ' * len(match.group()), sanitized)
sanitized = re.sub(r'\([^)]*\)', lambda match: ' ' * len(match.group()), sanitized)
sanitized = re.sub(r'\{[^}]*\}', lambda match: ' ' * len(match.group()), sanitized)

params = dict(re_param.findall(sanitized))
params = { k.strip():params[k].strip() for k in params if k.lower() not in ['hashes', 'lora', 'embeddings', 'prompt', 'negative prompt']} # remove some keys
if len(list(params)) == 0:
first_param = None
else:
try:
first_param, first_param_idx = next((s, i) for i, s in enumerate(params) if any(x in s.lower() for x in basic_params))
except Exception:
first_param, first_param_idx = next(iter(params)), 0
if first_param_idx > 0:
for _i in range(first_param_idx):
params.pop(next(iter(params)))
params_idx = sanitized.find(f'{first_param}:') if first_param else -1
negative_idx = infotext.find("Negative prompt:")

prompt = infotext[:params_idx] if negative_idx == -1 else infotext[:negative_idx] # prompt can be with or without negative prompt
negative = infotext[negative_idx:params_idx] if negative_idx >= 0 else ''
module_file = os.path.abspath(__file__)
module_dir = os.path.dirname(module_file)
module_spec = importlib.util.spec_from_file_location('infotext', os.path.join(module_dir, '..', 'modules', 'infotext.py'))
infotext = importlib.util.module_from_spec(module_spec)
module_spec.loader.exec_module(infotext)

for k, v in params.copy().items(): # avoid dict-has-changed
if len(v) > 0 and v[0] == '"' and v[-1] == '"':
v = unquote(v)
m = re_size.match(v)
if v.replace('.', '', 1).isdigit():
params[k] = float(v) if '.' in v else int(v)
elif v == "True":
params[k] = True
elif v == "False":
params[k] = False
elif m is not None:
params[f"{k}-1"] = int(m.group(1))
params[f"{k}-2"] = int(m.group(2))
elif k == 'VAE' and v == 'TAESD':
params["Full quality"] = False
else:
params[k] = v
params["Prompt"] = prompt.replace('Prompt:', '').strip()
params["Negative prompt"] = negative.replace('Negative prompt:', '').strip()
return params


class Exif: # pylint: disable=single-string-used-for-slots
Expand Down Expand Up @@ -132,7 +79,7 @@ def decode(self, s: bytes):

def parse(self):
x = self.exif.pop('parameters', None) or self.exif.pop('UserComment', None)
res = parse_generation_parameters(x)
res = infotext.parse(x)
return res

def get_bytes(self):
Expand Down
2 changes: 2 additions & 0 deletions cli/simple-txt2img.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ def generate(args): # pylint: disable=redefined-outer-name
options['sampler_name'] = args.sampler
options['width'] = int(args.width)
options['height'] = int(args.height)
options['restore_faces'] = args.faces
data = post('/sdapi/v1/txt2img', options)
t1 = time.time()
if 'images' in data:
Expand All @@ -71,6 +72,7 @@ def generate(args): # pylint: disable=redefined-outer-name
parser.add_argument('--height', required=False, default=512, help='image height')
parser.add_argument('--steps', required=False, default=20, help='number of steps')
parser.add_argument('--seed', required=False, default=-1, help='initial seed')
parser.add_argument('--faces', action='store_true', help='restore faces')
parser.add_argument('--sampler', required=False, default='Euler a', help='sampler name')
parser.add_argument('--output', required=False, default=None, help='output image file')
parser.add_argument('--model', required=False, help='model name')
Expand Down
41 changes: 41 additions & 0 deletions configs/sd3/model_index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"_class_name": "StableDiffusion3Pipeline",
"_diffusers_version": "0.29.0.dev0",
"_name_or_path": "stabilityai/stable-diffusion-3-medium",
"scheduler": [
"diffusers",
"FlowMatchEulerDiscreteScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModelWithProjection"
],
"text_encoder_2": [
"transformers",
"CLIPTextModelWithProjection"
],
"text_encoder_3": [
"transformers",
"T5EncoderModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"tokenizer_2": [
"transformers",
"CLIPTokenizer"
],
"tokenizer_3": [
"transformers",
"T5TokenizerFast"
],
"transformer": [
"diffusers",
"SD3Transformer2DModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
6 changes: 6 additions & 0 deletions configs/sd3/scheduler/scheduler_config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"_class_name": "FlowMatchEulerDiscreteScheduler",
"_diffusers_version": "0.29.0.dev0",
"num_train_timesteps": 1000,
"shift": 3.0
}
Loading

0 comments on commit c8b5ed4

Please sign in to comment.